xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [v8][PATCH 00/16] Fix RMRR
@ 2015-07-16  6:52   ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                       ` (15 more replies)
  0 siblings, 16 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel

v8:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Force to pass "0"(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

* Patch #6: hvmloader/pci: disable all pci devices conflicting
  We have a big change to this patch:

  Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

  I know this is still argumented but I'd like to discuss this based on this
  revision and thanks for your time.

* Patch #7: hvmloader/e820: construct guest e820 table
  define low_mem_end as uint32_t;
  Correct those two wrong loops, memory_map.nr_map -> nr
  when we're trying to revise low/high memory e820 entries;
  Improve code comments and the patch head description;
  Add one check if highmem is just populated by hvmloader itself

* Patch #11: tools/libxl: detect and avoid conflicts with RDM
  Introduce pfn_to_paddr(x) -> ((uint64_t)x << XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

* Patch #13: libxl: construct e820 map with RDM information for HVM guest
  make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

* Patch #15:  xen/vtd: prevent from assign the device with shared rmrr
  Merge two if{} as one if{};
  Add to print RMRR range info when stop assign a group device

* Some minimal code style changes

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

* Just sync with the fallout of renaming parameters above.

Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. ) And
as we discussed I'd further improve something as next step after
this round of review.

v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
  and based on our discussion, we won't expose "ignore" in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now "0" means "strict" and "1" means "relaxed", and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename that field, is_64bar, inside struct
  bars with flag, and then extend to also indicate if this bar is already
  allocated.

* Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(),
  and then replace malloc() with libxl__malloc(), and finally cleanup this fallout.
  libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

* The original patch #13 is sent out separately since actually this is not related
  to RMRR.

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: detect and avoid conflicts with RDM
      tools: introduce a new parameter to set a predefined rdm boundary
      libxl: construct e820 map with RDM information for HVM guest
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr
      tools: parse to enable new rdm policy parameters

Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5                       | 103 ++++++++
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             | 127 ++++++++-
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |   2 +
 tools/firmware/hvmloader/pci.c              |  87 +++++++
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  12 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  45 +++-
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_arch.h                    |   7 +
 tools/libxl/libxl_arm.c                     |   8 +
 tools/libxl/libxl_create.c                  |  13 +-
 tools/libxl/libxl_dm.c                      | 273 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  13 +-
 tools/libxl/libxl_pci.c                     |  12 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  83 ++++++
 tools/libxl/libxlu_pci.c                    |  92 ++++++-
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  16 +-
 tools/python/xen/lowlevel/xc/xc.c           |  30 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  43 ++-
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |   3 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  15 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  82 ++++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |  13 +-
 xen/include/public/domctl.h                 |   3 +
 xen/include/public/memory.h                 |  37 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 46 files changed, 1383 insertions(+), 83 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
                       ` (14 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Jan Beulich

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich <jbeulich@suse.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7 ~ v8:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 37 +++++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a730de5 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 48820ea..44ed23d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..ac7d3da 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,42 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /*
+     * IN/OUT
+     *
+     * Gets set to the required number of entries when too low,
+     * signaled by error code -ERANGE.
+     */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..d176e8b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 02/16] xen/vtd: create RMRR mapping
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                       ` (13 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Tim Deegan, Jan Beulich,
	Yang Zhang

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v8:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
    return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c               | 40 +++++++++++++++++++++++++++++++++++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h           | 13 +++++++++---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
                          p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                           unsigned long mfn, unsigned int page_order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
     gfn_lock(p2m, gfn, page_order);
-    p2m_remove_page(p2m, gfn, mfn, page_order);
+    rc = p2m_remove_page(p2m, gfn, mfn, page_order);
     gfn_unlock(p2m, gfn, page_order);
+    return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+
+    if ( !paging_mode_translate(p2m->domain) )
+        return 0;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                            p2m_mmio_direct, p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+    {
+        ret = -EBUSY;
+        printk(XENLOG_G_WARNING
+               "Cannot setup identity map d%d:%lx,"
+               " gfn already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+    }
+
+    gfn_unlock(p2m, gfn, 0);
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
             while ( base_pfn < end_pfn )
             {
-                if ( intel_iommu_unmap_page(d, base_pfn) )
+                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
                     ret = -ENXIO;
                 base_pfn++;
             }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..190a286 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -503,9 +503,9 @@ static inline int guest_physmap_add_page(struct domain *d,
 }
 
 /* Remove a page from a domain's p2m table */
-void guest_physmap_remove_page(struct domain *d,
-                               unsigned long gfn,
-                               unsigned long mfn, unsigned int page_order);
+int guest_physmap_remove_page(struct domain *d,
+                              unsigned long gfn,
+                              unsigned long mfn, unsigned int page_order);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -543,6 +543,13 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
+#define clear_identity_p2m_entry(d, gfn, page_order) \
+                        guest_physmap_remove_page(d, gfn, gfn, page_order)
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  7:40       ` Jan Beulich
  2015-07-16 11:09       ` George Dunlap
  2015-07-16  6:52     ` [v8][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                       ` (12 subsequent siblings)
  15 siblings, 2 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Yang Zhang,
	Stefano Stabellini, Ian Campbell

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* Force to pass "0"(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  "0" means "strict" and "1" means "relaxed".

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c                       |  7 ++++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       |  3 ++-
 xen/drivers/passthrough/pci.c               | 15 ++++++++----
 xen/drivers/passthrough/vtd/iommu.c         | 37 ++++++++++++++++++++++-------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  3 +++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, unsigned int flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         ret = 0;
     else
     {
-        ret = -EBUSY;
+        if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
+            ret = 0;
+        else
+            ret = -EBUSY;
         printk(XENLOG_G_WARNING
                "Cannot setup identity map d%d:%lx,"
                " gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    /* The flag field doesn't matter to DT device. */
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0);
 
     if ( rc )
         goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..6e23fc6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
+        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8415958..b5d658e 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1807,7 +1807,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1855,7 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            /*
+             * iommu_add_device() is only called for the hardware
+             * domain (see xen/drivers/passthrough/pci.c:pci_add_device()).
+             * Since RMRRs are always reserved in the e820 map for the hardware
+             * domain, there shouldn't be a conflict.
+             */
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr, 0);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1939,7 +1946,11 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        /*
+         * Any flag is nothing to clear these mappings but here
+         * its always safe and strict to set 0.
+         */
+        rmrr_identity_mapping(pdev->domain, 0, rmrr, 0);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2109,13 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        /*
+         * Here means we're add a device to the hardware domain.
+         * Since RMRRs are always reserved in the e820 map for the hardware
+         * domain, there shouldn't be a conflict. So its always safe and
+         * strict to set 0.
+         */
+        ret = rmrr_identity_mapping(d, 1, rmrr, 0);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2258,11 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                /*
+                 * Any RMRR flag is always ignored when remove a device,
+                 * but its always safe and strict to set 0.
+                 */
+                ret = rmrr_identity_mapping(source, 0, rmrr, 0);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2286,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2315,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 190a286..68da0a9 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, unsigned int flag);
 
 #define clear_identity_p2m_entry(d, gfn, page_order) \
                         guest_physmap_remove_page(d, gfn, gfn, page_order)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..bca25c9 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -478,6 +478,9 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 04/16] xen: enable XENMEM_memory_map in hvm
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (2 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                       ` (11 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
v5 ~ v8:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (3 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  9:18       ` Jan Beulich
  2015-07-16 11:15       ` George Dunlap
  2015-07-16  6:52     ` [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
                       ` (10 subsequent siblings)
  15 siblings, 2 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c      | 35 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 12 ++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..b72baa5 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc || !nr_entries )
+    {
+        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+        BUG();
+    }
+
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED )
+        {
+            if ( check_overlap(alloc_addr, alloc_size,
+                               memory_map.map[i].addr,
+                               memory_map.map[i].size) )
+            {
+                printf("Fail to setup memory map due to conflict");
+                printf(" on dynamic reserved memory range.\n");
+                BUG();
+            }
+        }
+    }
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
     uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,13 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[],
+                           unsigned int *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (4 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16 11:32       ` George Dunlap
  2015-07-16  6:52     ` [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
                       ` (9 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

When allocating mmio address for PCI bars, mmio may overlap with
reserved regions. Currently we just want to disable these associate
devices simply to avoid conflicts but we will reshape current mmio
allocation mechanism to fix this completely.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 87 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..9e017d5 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,90 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/*
+ * We should check if all valid bars conflict with RDM.
+ *
+ * Here we just need to check mmio bars in the case of non-highmem
+ * since the hypervisor can make sure RDM doesn't involve highmem.
+ */
+static void disable_conflicting_devices(void)
+{
+    uint8_t is_64bar;
+    uint32_t devfn, bar_reg, cmd, bar_data;
+    uint16_t vendor_id, device_id;
+    unsigned int bar, i;
+    uint64_t bar_sz;
+    bool is_conflict = false;
+
+    for ( devfn = 0; devfn < 256; devfn++ )
+    {
+        vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
+        device_id = pci_readw(devfn, PCI_DEVICE_ID);
+        if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
+            continue;
+
+        /* Check all bars */
+        for ( bar = 0; bar < 7; bar++ )
+        {
+            bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
+            if ( bar == 6 )
+                bar_reg = PCI_ROM_ADDRESS;
+
+            bar_data = pci_readl(devfn, bar_reg);
+            bar_data &= PCI_BASE_ADDRESS_MEM_MASK;
+            if ( !bar_data )
+                continue;
+
+            if ( bar_reg != PCI_ROM_ADDRESS )
+                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
+                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
+                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
+                             PCI_BASE_ADDRESS_MEM_TYPE_64));
+
+            /* Until here we never conflict high memory. */
+            if ( is_64bar && pci_readl(devfn, bar_reg + 4) )
+                continue;
+
+            /* Just check mmio bars. */
+            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+                  PCI_BASE_ADDRESS_SPACE_IO) )
+                continue;
+
+            bar_sz = pci_readl(devfn, bar_reg);
+            bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;
+
+            for ( i = 0; i < memory_map.nr_map ; i++ )
+            {
+                if ( memory_map.map[i].type != E820_RAM )
+                {
+                    uint64_t reserved_start, reserved_size;
+                    reserved_start = memory_map.map[i].addr;
+                    reserved_size = memory_map.map[i].size;
+                    if ( check_overlap(bar_data , bar_sz,
+                                   reserved_start, reserved_size) )
+                    {
+                        is_conflict = true;
+                        /* Now disable the memory or I/O mapping. */
+                        printf("pci dev %02x:%x bar %02x : 0x%08x : conflicts "
+                               "reserved resource so disable this device.!\n",
+                               devfn>>3, devfn&7, bar_reg, bar_data);
+                        cmd = pci_readw(devfn, PCI_COMMAND);
+                        pci_writew(devfn, PCI_COMMAND, ~cmd);
+                        break;
+                    }
+                }
+
+                /* Jump next device. */
+                if ( is_conflict )
+                {
+                    is_conflict = false;
+                    break;
+                }
+            }
+        }
+    }
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -462,6 +546,9 @@ void pci_setup(void)
         cmd |= PCI_COMMAND_IO;
         pci_writew(vga_devfn, PCI_COMMAND, cmd);
     }
+
+    /* If pci bars conflict with RDM we need to disable this pci device. */
+    disable_conflicting_devices();
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (5 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16 11:47       ` George Dunlap
  2015-07-16  6:52     ` [v8][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                       ` (8 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map -> nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 
 tools/firmware/hvmloader/e820.c | 92 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 83 insertions(+), 9 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index b72baa5..aa678a7 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint64_t add_high_mem = 0;
+    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,9 +189,73 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        e820[nr] = memory_map.map[i];
+        nr++;
+    }
+
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));
 
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Its possible to relocate RAM to allocate sufficient MMIO previously
+     * so low_mem_pgend would be changed over there. And here memory_map[]
+     * records the original low/high memory, so if low_mem_end is less than
+     * the original we need to revise low/high memory range in e820.
+     */
+    for ( i = 0; i < nr; i++ )
     {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+        }
+    }
+
+    /*
+     * And then we also need to adjust highmem.
+     */
+    if ( add_high_mem )
+    {
+        for ( i = 0; i < nr; i++ )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr == (1ull << 32))
+            {
+                e820[i].size += add_high_mem;
+                add_high_mem = 0;
+                break;
+            }
+        }
+    }
+
+    /* Or this is just populated by hvmloader itself. */
+    if ( add_high_mem )
+    {
+        /*
+         * hvmloader should always update hvm_info->high_mem_pgend
+         * when it relocates RAM anywhere.
+         */
+        BUG_ON( !hvm_info->high_mem_pgend );
+
         e820[nr].addr = ((uint64_t)1 << 32);
         e820[nr].size =
             ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
@@ -204,6 +263,21 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
+    /* Finally we need to sort all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (6 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
                       ` (7 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4 ~ v8:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (7 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
                       ` (6 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, David Scott, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: David Scott <dave.scott@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v8:

* Nothing is changed.

v5:

* Fix the flag field as "0" to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  9 ++++++++-
 tools/libxl/libxl_pci.c             |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 ++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 30 ++++++++++++++++++++----------
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
     domctl.u.assign_device.u.dt.size = size;
+    /*
+     * DT doesn't own any RDM so actually DT has nothing to do
+     * for any flag and here just fix that as 0.
+     */
+    domctl.u.assign_device.flag = 0;
     set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
     rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..a4928c6 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,17 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        /* O means we take "strict" as our default policy. */
+        *flag = 0;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +630,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +663,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +696,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (8 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                       ` (5 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
* Don't expose "ignore" in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 81 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++
 tools/libxl/libxl_create.c   |  7 ++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  9 +++++
 tools/libxl/libxl_types.idl  | 18 ++++++++++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item B<rdm="RDM_RESERVATION_STRING">
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<strategy="STRING">
+
+Currently there is only one valid type:
+
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does exist.
+
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad800000 end_address afffffff
+
+In this conflict case,
+
+#1. If B<strategy> is set to "host", for example,
+
+rdm = "strategy=host,policy=strict" or rdm = "strategy=host,policy=relaxed"
+
+It means all conflicts will be handled according to the policy
+introduced by B<policy> as described below.
+
+#2. If B<strategy> is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=xxxxx' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by B<rdm_policy="STRING"> as described inside pci options.
+
+=item B<policy="STRING">
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+"strict" means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+"relaxed" allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden by rdm_policy option in PCI device configuration.
+
+=back
+
+=back
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -717,6 +790,14 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_policy="STRING">
+
+(HVM/x86 only) This is same as policy option inside the rdm option but
+just specific to a given device. Therefore the default is "relaxed" as
+same as policy option as well.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..88b2102 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "strategy=host, policy=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_policy=relaxed', '03:00.0,rdm_policy=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..f75d4f1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -384,6 +390,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        libxl__rdm_setdefault(gc, b_info);
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d52589e..d397143 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1154,6 +1154,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..1ebdce7 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -988,6 +988,12 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_policy != LIBXL_RDM_RESERVE_POLICY_RELAXED) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
@@ -1040,6 +1046,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if (pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        pci->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..47dd83a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+    (0, "ignore"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_policy = Enumeration("rdm_reserve_policy", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_POLICY_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("strategy",    libxl_rdm_reserve_strategy),
+    ("policy",      libxl_rdm_reserve_policy),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -467,6 +483,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        # See libxl_ms_vm_genid_generate()
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
+                                       ("rdm", libxl_rdm_reserve),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
@@ -542,6 +559,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
 libxl_device_dtdev = Struct("device_dtdev", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (9 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
                       ` (4 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v8:

* Introduce pfn_to_paddr(x) -> ((uint64_t)x << XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c       | 273 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c      |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+                         uint32_t flag,
+                         uint16_t seg,
+                         uint8_t bus,
+                         uint8_t devfn,
+                         unsigned int *nr_entries,
+                         struct xen_reserved_device_memory **xrdm)
+{
+    int rc = 0, r;
+
+    /*
+     * We really can't presume how many entries we can get in advance.
+     */
+    *nr_entries = 0;
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      NULL, nr_entries);
+    assert(r <= 0);
+    /* "0" means we have no any rdm entry. */
+    if (!r) goto out;
+
+    if (errno != ENOBUFS) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *xrdm = libxl__malloc(gc,
+                          *nr_entries * sizeof(xen_reserved_device_memory_t));
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      *xrdm, nr_entries);
+    if (r)
+        rc = ERROR_FAIL;
+
+ out:
+    if (rc) {
+        *nr_entries = 0;
+        *xrdm = NULL;
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+    }
+    return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+#define pfn_to_paddr(x) ((uint64_t)x << XC_PAGE_SHIFT)
+static void
+set_rdm_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                uint64_t rdm_start, uint64_t rdm_size, int rdm_policy,
+                unsigned int nr_entries)
+{
+    assert(nr_entries);
+
+    d_config->num_rdms = nr_entries;
+    d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                            d_config->num_rdms * sizeof(libxl_device_rdm));
+
+    d_config->rdms[d_config->num_rdms - 1].start = rdm_start;
+    d_config->rdms[d_config->num_rdms - 1].size = rdm_size;
+    d_config->rdms[d_config->num_rdms - 1].policy = rdm_policy;
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ * Note when both policies are specified on a given region, the per-device
+ * policy should override the global policy.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict, rc;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint32_t strategy = d_config->b_info.u.hvm.rdm.strategy;
+    uint16_t seg;
+    uint8_t bus, devfn;
+    uint64_t rdm_start, rdm_size;
+    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+
+    /* Might not expose rdm. */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE &&
+        !d_config->num_pcidevs)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                      0, 0, 0, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        if (!nr_entries)
+            return 0;
+
+        assert(xrdm);
+
+        for (i = 0; i < nr_entries; i++)
+            set_rdm_entries(gc, d_config,
+                            pfn_to_paddr(xrdm[i].start_pfn),
+                            pfn_to_paddr(xrdm[i].nr_pages),
+                            d_config->b_info.u.hvm.rdm.policy,
+                            i+1);
+    } else {
+        d_config->num_rdms = 0;
+    }
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+        bool new = true;
+
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                      seg, bus, devfn, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        assert(xrdm);
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RDMs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RDM entry
+         *
+         * Different policies may be configured on the same RDM due to above
+         * two cases. But we don't allow to assign such a group devies right
+         * now so it doesn't come true in our case.
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start == pfn_to_paddr(xrdm[0].start_pfn))
+            {
+                /*
+                 * So the per-device policy always override the global policy
+                 * in this case.
+                 */
+                d_config->rdms[j].policy = d_config->pcidevs[i].rdm_policy;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            set_rdm_entries(gc, d_config,
+                            pfn_to_paddr(xrdm[0].start_pfn),
+                            pfn_to_paddr(xrdm[0].nr_pages),
+                            d_config->pcidevs[i].rdm_policy,
+                            d_config->num_rdms);
+        }
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RDM  in highmem area (>4G)
+     *  - RDM lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RDM entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RDM entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_end - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_end = rdm_start;
+        }
+    }
+
+    /* Sync highmem_end. */
+    args->highmem_end = highmem_end;
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_end,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 args->highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].policy = LIBXL_RDM_RESERVE_POLICY_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bdc0465..80fa17d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,13 +914,20 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    /*
+     * Currently we fix this as 2G to guarantee how to handle
+     * our rdm policy. But we'll provide a parameter to set
+     * this dynamically.
+     */
+    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -958,6 +965,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.highmem_end = highmem_end;
     args.mmio_start = mmio_start;
 
+    rc = libxl__domain_device_construct_rdm(gc, d_config,
+                                            rdm_mem_boundary,
+                                            &args);
+    if (rc) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (info->num_vnuma_nodes != 0) {
         int i;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d397143..b4d8419 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1057,7 +1057,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1565,6 +1565,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 47dd83a..a3ad8d1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -562,6 +562,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("policy", libxl_rdm_reserve_policy),
+    ])
+
 libxl_device_dtdev = Struct("device_dtdev", [
     ("path", string),
     ])
@@ -592,6 +598,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (10 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-16  6:52     ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
                       ` (3 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable "rdm_mem_boundary_memkb" specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5       | 22 ++++++++++++++++++++++
 tools/libxl/libxl.h         |  6 ++++++
 tools/libxl/libxl_create.c  |  4 ++++
 tools/libxl/libxl_dom.c     |  8 +-------
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is available
 on the XenVGAPassthrough L<http://wiki.xen.org/wiki/XenVGAPassthrough>
 wiki page.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this
+        entry INVALID to indicate we shouldn't expose this entry to
+        hvmloader.
+
+Here the default is 2G.
+
 =item B<dtdev=[ "DTDEV_PATH", "DTDEV_PATH", ... ]>
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
     uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
 {
     if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
         b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+    if (b_info->u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->u.hvm.rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
     libxl_domain_build_info *const info = &d_config->b_info;
-    /*
-     * Currently we fix this as 2G to guarantee how to handle
-     * our rdm policy. But we'll provide a parameter to set
-     * this dynamically.
-     */
-    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mmio_start = mmio_start;
 
     rc = libxl__domain_device_construct_rdm(gc, d_config,
-                                            rdm_mem_boundary,
+                                            info->u.hvm.rdm_mem_boundary_memkb*1024,
                                             &args);
     if (rc) {
         LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
+                                       ("rdm_mem_boundary_memkb", MemKB),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..dfb50d6 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1519,6 +1519,9 @@ static void parse_config_data(const char *config_source,
                     exit(1);
             }
         }
+
+        if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+            b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (11 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-07-16  6:52     ` Tiejun Chen
  2015-07-22 13:55       ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest [and 1 more messages] Ian Jackson
  2015-07-16  6:53     ` [v8][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
                       ` (2 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.


 tools/libxl/libxl_arch.h |  7 ++++
 tools/libxl/libxl_arm.c  |  8 +++++
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
     return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (libxl__arch_domain_construct_memmap(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
+    int rc = 0;
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    struct e820entry *e820 = NULL;
+    uint64_t highmem_size =
+                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+    /* Add all rdm entries. */
+    for (i = 0; i < d_config->num_rdms; i++)
+        if (d_config->rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+            e820_entries++;
+
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+        e820_entries++;
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+out:
+    return rc;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-basic-offset: 4
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 14/16] xen/vtd: enable USB device assignment
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (12 preceding siblings ...)
  2015-07-16  6:52     ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
@ 2015-07-16  6:53     ` Tiejun Chen
  2015-07-16  6:53     ` [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
  2015-07-16  6:53     ` [v8][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v5 ~ v8:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (13 preceding siblings ...)
  2015-07-16  6:53     ` [v8][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-07-16  6:53     ` Tiejun Chen
  2015-07-16  7:42       ` Jan Beulich
  2015-07-16  6:53     ` [v8][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..8b7e18f 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: in the future we can introduce group device assignment
+     * interface to make sure devices sharing RMRR are assigned to the
+     * same domain together.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn &&
+             rmrr->scope.devices_cnt > 1 )
+            {
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign %04x:%02x:%02x.%u"
+                       " with shared RMRR at %"PRIx64" for Dom%d.\n",
+                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                       rmrr->base_address, d->domain_id);
+                return -EPERM;
+            }
+    }
+
     ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
     if ( ret )
         return ret;
 
-    seg = pdev->seg;
-    bus = pdev->bus;
-
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v8][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
                       ` (14 preceding siblings ...)
  2015-07-16  6:53     ` [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-07-16  6:53     ` Tiejun Chen
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-16  6:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++++++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_STRATEGY      10
+#define STATE_RESERVE_POLICY    11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
-                }else{
+                } else if (!strcmp(optkey, "rdm_policy")) {
+                    if (!strcmp(tok, "strict")) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                    } else if (!strcmp(tok, "relaxed")) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " policy: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
+                } else {
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
                 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_STRATEGY;
+                *ptr = '\0';
+                if (strcmp(tok, "strategy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_STRATEGY:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_POLICY;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM strategy option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_POLICY:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "policy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property policy value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index dfb50d6..38d6c53 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1923,6 +1923,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->u.hvm.rdm.strategy = rdm.strategy;
+            b_info->u.hvm.rdm.policy = rdm.policy;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1937,6 +1945,11 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /*
+             * Like other pci option, the per-device policy always follows
+             * the global policy by default.
+             */
+            pcidev->rdm_policy = b_info->u.hvm.rdm.policy;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-16  6:52     ` [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-16  7:40       ` Jan Beulich
  2015-07-16  7:48         ` Chen, Tiejun
  2015-07-16 11:09       ` George Dunlap
  1 sibling, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-16  7:40 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>>> On 16.07.15 at 08:52, <tiejun.chen@intel.com> wrote:
> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>          seg = machine_sbdf >> 16;
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )

Didn't we settle on flag & ~XEN_DOMCTL_DEV_RDM_RELAXED?

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-16  6:53     ` [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-07-16  7:42       ` Jan Beulich
  0 siblings, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-16  7:42 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, xen-devel

>>> On 16.07.15 at 08:53, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
>      if ( list_empty(&acpi_drhd_units) )
>          return -ENODEV;
>  
> +    seg = pdev->seg;
> +    bus = pdev->bus;
> +    /*
> +     * In rare cases one given rmrr is shared by multiple devices but
> +     * obviously this would put the security of a system at risk. So
> +     * we should prevent from this sort of device assignment.
> +     *
> +     * TODO: in the future we can introduce group device assignment
> +     * interface to make sure devices sharing RMRR are assigned to the
> +     * same domain together.
> +     */
> +    for_each_rmrr_device( rmrr, bdf, i )
> +    {
> +        if ( rmrr->segment == seg &&
> +             PCI_BUS(bdf) == bus &&
> +             PCI_DEVFN2(bdf) == devfn &&
> +             rmrr->scope.devices_cnt > 1 )
> +            {
> +                printk(XENLOG_G_ERR VTDPREFIX
> +                       " cannot assign %04x:%02x:%02x.%u"
> +                       " with shared RMRR at %"PRIx64" for Dom%d.\n",
> +                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                       rmrr->base_address, d->domain_id);
> +                return -EPERM;
> +            }

Indentation.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-16  7:40       ` Jan Beulich
@ 2015-07-16  7:48         ` Chen, Tiejun
  2015-07-16  7:58           ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16  7:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

On 2015/7/16 15:40, Jan Beulich wrote:
>>>> On 16.07.15 at 08:52, <tiejun.chen@intel.com> wrote:
>> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>>           seg = machine_sbdf >> 16;
>>           bus = PCI_BUS(machine_sbdf);
>>           devfn = PCI_DEVFN2(machine_sbdf);
>> +        flag = domctl->u.assign_device.flag;
>> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
>
> Didn't we settle on flag & ~XEN_DOMCTL_DEV_RDM_RELAXED?

Sorry its my fault to miss this merge.

BTW, could I resend this patch separately to get your Ack? If you don't 
have other objections.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-16  7:48         ` Chen, Tiejun
@ 2015-07-16  7:58           ` Jan Beulich
  0 siblings, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-16  7:58 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>>> On 16.07.15 at 09:48, <tiejun.chen@intel.com> wrote:
> On 2015/7/16 15:40, Jan Beulich wrote:
>>>>> On 16.07.15 at 08:52, <tiejun.chen@intel.com> wrote:
>>> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>>>           seg = machine_sbdf >> 16;
>>>           bus = PCI_BUS(machine_sbdf);
>>>           devfn = PCI_DEVFN2(machine_sbdf);
>>> +        flag = domctl->u.assign_device.flag;
>>> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
>>
>> Didn't we settle on flag & ~XEN_DOMCTL_DEV_RDM_RELAXED?
> 
> Sorry its my fault to miss this merge.
> 
> BTW, could I resend this patch separately to get your Ack? If you don't 
> have other objections.

Actually if we were to commit (parts of) this version, I'd even be fine
with doing the adjustment upon commit. So I'd suggest making the
change in your local copy so it would be as intended on v9, should
that need sending.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-16  6:52     ` [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-07-16  9:18       ` Jan Beulich
  2015-07-16 11:15       ` George Dunlap
  1 sibling, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-16  9:18 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 16.07.15 at 08:52, <tiejun.chen@intel.com> wrote:
> Now we get this map layout by call XENMEM_memory_map then
> save them into one global variable memory_map[]. It should
> include lowmem range, rdm range and highmem range. Note
> rdm range and highmem range may not exist in some cases.
> 
> And here we need to check if any reserved memory conflicts with
> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END].

[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END)

> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -23,6 +23,41 @@
>  #include "config.h"
>  #include "util.h"
>  
> +struct e820map memory_map;
> +
> +void memory_map_setup(void)
> +{
> +    unsigned int nr_entries = E820MAX, i;
> +    int rc;
> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
> +
> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
> +
> +    if ( rc || !nr_entries )
> +    {
> +        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
> +        BUG();
> +    }
> +
> +    memory_map.nr_map = nr_entries;
> +
> +    for ( i = 0; i < nr_entries; i++ )
> +    {
> +        if ( memory_map.map[i].type == E820_RESERVED )
> +        {
> +            if ( check_overlap(alloc_addr, alloc_size,
> +                               memory_map.map[i].addr,
> +                               memory_map.map[i].size) )
> +            {
> +                printf("Fail to setup memory map due to conflict");
> +                printf(" on dynamic reserved memory range.\n");
> +                BUG();
> +            }
> +        }

Another case of two if()-s which should be folded.

Again, no need to re-submit just because of this; with it fixed
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-16  6:52     ` [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
  2015-07-16  7:40       ` Jan Beulich
@ 2015-07-16 11:09       ` George Dunlap
  1 sibling, 0 replies; 83+ messages in thread
From: George Dunlap @ 2015-07-16 11:09 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On Thu, Jul 16, 2015 at 7:52 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.
>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

With or without the "flags &" change:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> v8:
>
> * Force to pass "0"(strict) when add or move a device in hardware domain,
>   and improve some associated code comments.
>
> v6 ~ v7:
>
> * Nothing is changed.
>
> v5:
>
> * Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
>   "0" means "strict" and "1" means "relaxed".
>
> * So make DT device ignore the flag field
>
> * Improve the code comments
>
> v4:
>
> * Add code comments to describer why we fix to set a policy flag in some
>   cases like adding a device to hwdomain, and removing a device from user domain.
>
> * Avoid using fixed width types for the parameter of set_identity_p2m_entry()
>
> * Fix one judging condition
>   domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
>   -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM
>
> * Add to range check the flag passed to make future extensions possible
>   (and to avoid ambiguity on what out of range values would mean).
>
>  xen/arch/x86/mm/p2m.c                       |  7 ++++--
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
>  xen/drivers/passthrough/arm/smmu.c          |  2 +-
>  xen/drivers/passthrough/device_tree.c       |  3 ++-
>  xen/drivers/passthrough/pci.c               | 15 ++++++++----
>  xen/drivers/passthrough/vtd/iommu.c         | 37 ++++++++++++++++++++++-------
>  xen/include/asm-x86/p2m.h                   |  2 +-
>  xen/include/public/domctl.h                 |  3 +++
>  xen/include/xen/iommu.h                     |  2 +-
>  9 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 99a26ca..47785dc 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>  }
>
>  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> -                           p2m_access_t p2ma)
> +                           p2m_access_t p2ma, unsigned int flag)
>  {
>      p2m_type_t p2mt;
>      p2m_access_t a;
> @@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>          ret = 0;
>      else
>      {
> -        ret = -EBUSY;
> +        if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
> +            ret = 0;
> +        else
> +            ret = -EBUSY;
>          printk(XENLOG_G_WARNING
>                 "Cannot setup identity map d%d:%lx,"
>                 " gfn already mapped to %lx.\n",
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index e83bb35..920b35a 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
>  }
>
>  static int amd_iommu_assign_device(struct domain *d, u8 devfn,
> -                                   struct pci_dev *pdev)
> +                                   struct pci_dev *pdev,
> +                                   u32 flag)
>  {
>      struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
>      int bdf = PCI_BDF2(pdev->bus, devfn);
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 6cc4394..9a667e9 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
>  }
>
>  static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
> -                              struct device *dev)
> +                              struct device *dev, u32 flag)
>  {
>         struct iommu_domain *domain;
>         struct arm_smmu_xen_domain *xen_domain;
> diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
> index 5d3842a..7ff79f8 100644
> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
>              goto fail;
>      }
>
> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
> +    /* The flag field doesn't matter to DT device. */
> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0);
>
>      if ( rc )
>          goto fail;
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index e30be43..6e23fc6 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
>      return pdev ? 0 : -EBUSY;
>  }
>
> -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> +static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>  {
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>      struct pci_dev *pdev;
> @@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>
>      pdev->fault.count = 0;
>
> -    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
> +    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
>          goto done;
>
>      for ( ; pdev->phantom_stride; rc = 0 )
> @@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>          devfn += pdev->phantom_stride;
>          if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
>              break;
> -        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
> +        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>          if ( rc )
>              printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
>                     d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> @@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
>  {
>      u16 seg;
>      u8 bus, devfn;
> +    u32 flag;
>      int ret = 0;
>      uint32_t machine_sbdf;
>
> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>          seg = machine_sbdf >> 16;
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
>
>          ret = device_assigned(seg, bus, devfn) ?:
> -              assign_device(d, seg, bus, devfn);
> +              assign_device(d, seg, bus, devfn, flag);
>          if ( ret == -ERESTART )
>              ret = hypercall_create_continuation(__HYPERVISOR_domctl,
>                                                  "h", u_domctl);
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 8415958..b5d658e 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1807,7 +1807,8 @@ static void iommu_set_pgd(struct domain *d)
>  }
>
>  static int rmrr_identity_mapping(struct domain *d, bool_t map,
> -                                 const struct acpi_rmrr_unit *rmrr)
> +                                 const struct acpi_rmrr_unit *rmrr,
> +                                 u32 flag)
>  {
>      unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
>      unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
> @@ -1855,7 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>
>      while ( base_pfn < end_pfn )
>      {
> -        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
>
>          if ( err )
>              return err;
> @@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
> +            /*
> +             * iommu_add_device() is only called for the hardware
> +             * domain (see xen/drivers/passthrough/pci.c:pci_add_device()).
> +             * Since RMRRs are always reserved in the e820 map for the hardware
> +             * domain, there shouldn't be a conflict.
> +             */
> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr, 0);
>              if ( ret )
>                  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
>                          pdev->domain->domain_id);
> @@ -1939,7 +1946,11 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
>               PCI_DEVFN2(bdf) != devfn )
>              continue;
>
> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
> +        /*
> +         * Any flag is nothing to clear these mappings but here
> +         * its always safe and strict to set 0.
> +         */
> +        rmrr_identity_mapping(pdev->domain, 0, rmrr, 0);
>      }
>
>      return domain_context_unmap(pdev->domain, devfn, pdev);
> @@ -2098,7 +2109,13 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
>      spin_lock(&pcidevs_lock);
>      for_each_rmrr_device ( rmrr, bdf, i )
>      {
> -        ret = rmrr_identity_mapping(d, 1, rmrr);
> +        /*
> +         * Here means we're add a device to the hardware domain.
> +         * Since RMRRs are always reserved in the e820 map for the hardware
> +         * domain, there shouldn't be a conflict. So its always safe and
> +         * strict to set 0.
> +         */
> +        ret = rmrr_identity_mapping(d, 1, rmrr, 0);
>          if ( ret )
>              dprintk(XENLOG_ERR VTDPREFIX,
>                       "IOMMU: mapping reserved region failed\n");
> @@ -2241,7 +2258,11 @@ static int reassign_device_ownership(
>                   PCI_BUS(bdf) == pdev->bus &&
>                   PCI_DEVFN2(bdf) == devfn )
>              {
> -                ret = rmrr_identity_mapping(source, 0, rmrr);
> +                /*
> +                 * Any RMRR flag is always ignored when remove a device,
> +                 * but its always safe and strict to set 0.
> +                 */
> +                ret = rmrr_identity_mapping(source, 0, rmrr, 0);
>                  if ( ret != -ENOENT )
>                      return ret;
>              }
> @@ -2265,7 +2286,7 @@ static int reassign_device_ownership(
>  }
>
>  static int intel_iommu_assign_device(
> -    struct domain *d, u8 devfn, struct pci_dev *pdev)
> +    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
>  {
>      struct acpi_rmrr_unit *rmrr;
>      int ret = 0, i;
> @@ -2294,7 +2315,7 @@ static int intel_iommu_assign_device(
>               PCI_BUS(bdf) == bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(d, 1, rmrr);
> +            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
>              if ( ret )
>              {
>                  reassign_device_ownership(d, hardware_domain, devfn, pdev);
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 190a286..68da0a9 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
>
>  /* Set identity addresses in the p2m table (for pass-through) */
>  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> -                           p2m_access_t p2ma);
> +                           p2m_access_t p2ma, unsigned int flag);
>
>  #define clear_identity_p2m_entry(d, gfn, page_order) \
>                          guest_physmap_remove_page(d, gfn, gfn, page_order)
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index bc45ea5..bca25c9 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -478,6 +478,9 @@ struct xen_domctl_assign_device {
>              XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>          } dt;
>      } u;
> +    /* IN */
> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +    uint32_t  flag;   /* flag of assigned device */
>  };
>  typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index e2f584d..02b2b02 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -140,7 +140,7 @@ struct iommu_ops {
>      int (*add_device)(u8 devfn, device_t *dev);
>      int (*enable_device)(device_t *dev);
>      int (*remove_device)(u8 devfn, device_t *dev);
> -    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
> +    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
>      int (*reassign_device)(struct domain *s, struct domain *t,
>                             u8 devfn, device_t *dev);
>  #ifdef HAS_PCI
> --
> 1.9.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-16  6:52     ` [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
  2015-07-16  9:18       ` Jan Beulich
@ 2015-07-16 11:15       ` George Dunlap
  1 sibling, 0 replies; 83+ messages in thread
From: George Dunlap @ 2015-07-16 11:15 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 7:52 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> Now we get this map layout by call XENMEM_memory_map then
> save them into one global variable memory_map[]. It should
> include lowmem range, rdm range and highmem range. Note
> rdm range and highmem range may not exist in some cases.
>
> And here we need to check if any reserved memory conflicts with
> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END].
> This range is used to allocate memory in hvmloder level, and
> we would lead hvmloader failed in case of conflict since its
> another rare possibility in real world.
>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Thanks,

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16  6:52     ` [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
@ 2015-07-16 11:32       ` George Dunlap
  2015-07-16 11:52         ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 11:32 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 7:52 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> When allocating mmio address for PCI bars, mmio may overlap with
> reserved regions. Currently we just want to disable these associate
> devices simply to avoid conflicts but we will reshape current mmio
> allocation mechanism to fix this completely.

On the whole I still think it would be good to try to relocate BARs if
possible; I would be OK with this if there isn't a better option.

A couple of comments on the patch, however:

>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v8:
>
> * Based on current discussion its hard to reshape the original mmio
>   allocation mechanism but we haven't a good and simple way to in short term.
>   So instead, we don't bring more complicated to intervene that process but
>   still check any conflicts to disable all associated devices.
>
> v6 ~ v7:
>
> * Nothing is changed.
>
> v5:
>
> * Rename that field, is_64bar, inside struct bars with flag, and
>   then extend to also indicate if this bar is already allocated.
>
> v4:
>
> * We have to re-design this as follows:
>
>   #1. Goal
>
>   MMIO region should exclude all reserved device memory
>
>   #2. Requirements
>
>   #2.1 Still need to make sure MMIO region is fit all pci devices as before
>
>   #2.2 Accommodate the not aligned reserved memory regions
>
>   If I'm missing something let me know.
>
>   #3. How to
>
>   #3.1 Address #2.1
>
>   We need to either of populating more RAM, or of expanding more highmem. But
>   we should know just 64bit-bar can work with highmem, and as you mentioned we
>   also should avoid expanding highmem as possible. So my implementation is to
>   allocate 32bit-bar and 64bit-bar orderly.
>
>   1>. The first allocation round just to 32bit-bar
>
>   If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
>   with all remaining resources including low pci memory.
>
>   If not, we need to calculate how much RAM should be populated to allocate the
>   remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
>   to the second allocation round 2>.
>
>   2>. The second allocation round to the remaining 32bit-bar
>
>   We should can finish allocating all 32bit-bar in theory, then go to the third
>   allocation round 3>.
>
>   3>. The third allocation round to 64bit-bar
>
>   We'll try to first allocate from the remaining low memory resource. If that
>   isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
>   should be same as the original.
>
>   #3.2 Address #2.2
>
>   I'm trying to accommodate the not aligned reserved memory regions:
>
>   We should skip all reserved device memory, but we also need to check if other
>   smaller bars can be allocated if a mmio hole exists between resource->base and
>   reserved device memory. If a hole exists between base and reserved device
>   memory, lets go out simply to try allocate for next bar since all bars are in
>   descending order of size. If not, we need to move resource->base to reserved_end
>   just to reallocate this bar.
>
>  tools/firmware/hvmloader/pci.c | 87 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 5ff87a7..9e017d5 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -38,6 +38,90 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  enum virtual_vga virtual_vga = VGA_none;
>  unsigned long igd_opregion_pgbase = 0;
>
> +/*
> + * We should check if all valid bars conflict with RDM.
> + *
> + * Here we just need to check mmio bars in the case of non-highmem
> + * since the hypervisor can make sure RDM doesn't involve highmem.
> + */
> +static void disable_conflicting_devices(void)
> +{
> +    uint8_t is_64bar;
> +    uint32_t devfn, bar_reg, cmd, bar_data;
> +    uint16_t vendor_id, device_id;
> +    unsigned int bar, i;
> +    uint64_t bar_sz;
> +    bool is_conflict = false;
> +
> +    for ( devfn = 0; devfn < 256; devfn++ )
> +    {
> +        vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
> +        device_id = pci_readw(devfn, PCI_DEVICE_ID);
> +        if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
> +            continue;
> +
> +        /* Check all bars */
> +        for ( bar = 0; bar < 7; bar++ )
> +        {
> +            bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
> +            if ( bar == 6 )
> +                bar_reg = PCI_ROM_ADDRESS;
> +
> +            bar_data = pci_readl(devfn, bar_reg);
> +            bar_data &= PCI_BASE_ADDRESS_MEM_MASK;
> +            if ( !bar_data )
> +                continue;
> +
> +            if ( bar_reg != PCI_ROM_ADDRESS )
> +                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
> +                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
> +                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                             PCI_BASE_ADDRESS_MEM_TYPE_64));
> +
> +            /* Until here we never conflict high memory. */
> +            if ( is_64bar && pci_readl(devfn, bar_reg + 4) )
> +                continue;
> +
> +            /* Just check mmio bars. */
> +            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +                  PCI_BASE_ADDRESS_SPACE_IO) )
> +                continue;
> +
> +            bar_sz = pci_readl(devfn, bar_reg);
> +            bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;
> +
> +            for ( i = 0; i < memory_map.nr_map ; i++ )
> +            {
> +                if ( memory_map.map[i].type != E820_RAM )

Here we're assuming that any region not marked as RAM is an RMRR.  Is that true?

In any case, it would be just as strange to have a device BAR overlap
with guest RAM as with an RMRR, wouldn't it?

> +                {
> +                    uint64_t reserved_start, reserved_size;
> +                    reserved_start = memory_map.map[i].addr;
> +                    reserved_size = memory_map.map[i].size;
> +                    if ( check_overlap(bar_data , bar_sz,
> +                                   reserved_start, reserved_size) )
> +                    {
> +                        is_conflict = true;
> +                        /* Now disable the memory or I/O mapping. */
> +                        printf("pci dev %02x:%x bar %02x : 0x%08x : conflicts "
> +                               "reserved resource so disable this device.!\n",
> +                               devfn>>3, devfn&7, bar_reg, bar_data);
> +                        cmd = pci_readw(devfn, PCI_COMMAND);
> +                        pci_writew(devfn, PCI_COMMAND, ~cmd);
> +                        break;
> +                    }
> +                }
> +
> +                /* Jump next device. */
> +                if ( is_conflict )
> +                {
> +                    is_conflict = false;
> +                    break;
> +                }

This conditional is still inside the memory_map loop; you want it one
loop futher out, in the bar loop, don't you?

Also, if you declare is_conflict inside the devfn loop, rather than in
the main function, then you don't need this "is_conflict=false" here.

It might also be more sensible to use a goto instead; but this is one
where Jan will have a better idea what standard practice will be.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16  6:52     ` [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-16 11:47       ` George Dunlap
  2015-07-16 13:12         ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 11:47 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 7:52 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> Now use the hypervisor-supplied memory map to build our final e820 table:
> * Add regions for BIOS ranges and other special mappings not in the
>   hypervisor map
> * Add in the hypervisor regions
> * Adjust the lowmem and highmem regions if we've had to relocate
>   memory (adding a highmem region if necessary)
> * Sort all the ranges so that they appear in memory order.
>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v8:
>
> * define low_mem_end as uint32_t
>
> * Correct those two wrong loops, memory_map.nr_map -> nr
>   when we're trying to revise low/high memory e820 entries.
>
> * Improve code comments and the patch head description
>
> * Add one check if highmem is just populated by hvmloader itself
>
> v5 ~ v7:
>
> * Nothing is changed.
>
> v4:
>
> * Rename local variable, low_mem_pgend, to low_mem_end.
>
> * Improve some code comments
>
> * Adjust highmem after lowmem is changed.
>
>
>  tools/firmware/hvmloader/e820.c | 92 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 83 insertions(+), 9 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index b72baa5..aa678a7 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint64_t add_high_mem = 0;
> +    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
>      e820[nr].type = E820_RESERVED;
>      nr++;
>
> -    /* Low RAM goes here. Reserve space for special pages. */
> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
> -    e820[nr].addr = 0x100000;
> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -    e820[nr].type = E820_RAM;
> -    nr++;
> -
>      /*
>       * Explicitly reserve space for special pages.
>       * This space starts at RESERVED_MEMBASE an extends to cover various
> @@ -194,9 +189,73 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
>
> +    /*
> +     * Construct E820 table according to recorded memory map.
> +     *
> +     * The memory map created by toolstack may include,
> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. Reserved regions if they exist
> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
> +    {
> +        e820[nr] = memory_map.map[i];
> +        nr++;
> +    }
> +
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_end < (2u << 20));
>
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Its possible to relocate RAM to allocate sufficient MMIO previously
> +     * so low_mem_pgend would be changed over there. And here memory_map[]
> +     * records the original low/high memory, so if low_mem_end is less than
> +     * the original we need to revise low/high memory range in e820.
> +     */
> +    for ( i = 0; i < nr; i++ )
>      {
> +        uint64_t end = e820[i].addr + e820[i].size;
> +        if ( e820[i].type == E820_RAM &&
> +             low_mem_end > e820[i].addr && low_mem_end < end )
> +        {
> +            add_high_mem = end - low_mem_end;
> +            e820[i].size = low_mem_end - e820[i].addr;
> +        }
> +    }
> +
> +    /*
> +     * And then we also need to adjust highmem.
> +     */
> +    if ( add_high_mem )
> +    {
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            if ( e820[i].type == E820_RAM &&
> +                 e820[i].addr == (1ull << 32))
> +            {
> +                e820[i].size += add_high_mem;
> +                add_high_mem = 0;
> +                break;
> +            }
> +        }
> +    }
> +
> +    /* Or this is just populated by hvmloader itself. */

This should probably say something like:

"If there was no highmem region, we need to create one."

> +    if ( add_high_mem )
> +    {
> +        /*
> +         * hvmloader should always update hvm_info->high_mem_pgend
> +         * when it relocates RAM anywhere.
> +         */
> +        BUG_ON( !hvm_info->high_mem_pgend );
> +
>          e820[nr].addr = ((uint64_t)1 << 32);
>          e820[nr].size =
>              ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;

In theory add_high_mem and hvm_info->high_mem_pgend << PAGE_SHIFT -
4GiB is the same, but it seems like asking for trouble to assume so
without checking.

Perhaps in the first if( add_high_mem ) conditional, you can
BUG_ON(add_high_mem != ((hvm_info->high_mem_pgend << PAGE_SHIFT) -
(ull1 << 32))) ?

Other than that, this looks good, thanks.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 11:32       ` George Dunlap
@ 2015-07-16 11:52         ` Chen, Tiejun
  2015-07-16 13:02           ` George Dunlap
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 11:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

>> +            for ( i = 0; i < memory_map.nr_map ; i++ )
>> +            {
>> +                if ( memory_map.map[i].type != E820_RAM )
>
> Here we're assuming that any region not marked as RAM is an RMRR.  Is that true?
>
> In any case, it would be just as strange to have a device BAR overlap
> with guest RAM as with an RMRR, wouldn't it?

OOPS! Actually I should take this,

if ( memory_map.map[i].type == E820_RESERVED )

This is same as when I check [RESERVED_MEMORY_DYNAMIC_START, 
RESERVED_MEMORY_DYNAMIC_END).

>
>> +                {
>> +                    uint64_t reserved_start, reserved_size;
>> +                    reserved_start = memory_map.map[i].addr;
>> +                    reserved_size = memory_map.map[i].size;
>> +                    if ( check_overlap(bar_data , bar_sz,
>> +                                   reserved_start, reserved_size) )
>> +                    {
>> +                        is_conflict = true;
>> +                        /* Now disable the memory or I/O mapping. */
>> +                        printf("pci dev %02x:%x bar %02x : 0x%08x : conflicts "
>> +                               "reserved resource so disable this device.!\n",
>> +                               devfn>>3, devfn&7, bar_reg, bar_data);
>> +                        cmd = pci_readw(devfn, PCI_COMMAND);
>> +                        pci_writew(devfn, PCI_COMMAND, ~cmd);
>> +                        break;
>> +                    }
>> +                }
>> +
>> +                /* Jump next device. */
>> +                if ( is_conflict )
>> +                {
>> +                    is_conflict = false;
>> +                    break;
>> +                }
>
> This conditional is still inside the memory_map loop; you want it one
> loop futher out, in the bar loop, don't you?

Here what I intended to do is if one of all bars specific to one given 
device already conflicts with RDM, its not necessary to continue check 
other remaining bars of this device and other RDM regions, we just 
disable this device simply then check next device.

>
> Also, if you declare is_conflict inside the devfn loop, rather than in
> the main function, then you don't need this "is_conflict=false" here.
>
> It might also be more sensible to use a goto instead; but this is one

This can work for me so it may be as follows:

     for ( devfn = 0; devfn < 256; devfn++ )
     {
  check_next_device:
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
         device_id = pci_readw(devfn, PCI_DEVICE_ID);
         if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
             continue;
     ...
                     if ( check_overlap(bar_data , bar_sz,
                                    reserved_start, reserved_size) )
                     {
			...
                         /* Jump next device. */
                         devfn++;
                         goto check_next_device;
                     }


> where Jan will have a better idea what standard practice will be.
>

I can follow that again if Jan has any good implementation.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 11:52         ` Chen, Tiejun
@ 2015-07-16 13:02           ` George Dunlap
  2015-07-16 13:21             ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 13:02 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 12:52 PM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>>> +            for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +            {
>>> +                if ( memory_map.map[i].type != E820_RAM )
>>
>>
>> Here we're assuming that any region not marked as RAM is an RMRR.  Is that
>> true?
>>
>> In any case, it would be just as strange to have a device BAR overlap
>> with guest RAM as with an RMRR, wouldn't it?
>
>
> OOPS! Actually I should take this,
>
> if ( memory_map.map[i].type == E820_RESERVED )
>
> This is same as when I check [RESERVED_MEMORY_DYNAMIC_START,
> RESERVED_MEMORY_DYNAMIC_END).
>
>
>>
>>> +                {
>>> +                    uint64_t reserved_start, reserved_size;
>>> +                    reserved_start = memory_map.map[i].addr;
>>> +                    reserved_size = memory_map.map[i].size;
>>> +                    if ( check_overlap(bar_data , bar_sz,
>>> +                                   reserved_start, reserved_size) )
>>> +                    {
>>> +                        is_conflict = true;
>>> +                        /* Now disable the memory or I/O mapping. */
>>> +                        printf("pci dev %02x:%x bar %02x : 0x%08x :
>>> conflicts "
>>> +                               "reserved resource so disable this
>>> device.!\n",
>>> +                               devfn>>3, devfn&7, bar_reg, bar_data);
>>> +                        cmd = pci_readw(devfn, PCI_COMMAND);
>>> +                        pci_writew(devfn, PCI_COMMAND, ~cmd);
>>> +                        break;
>>> +                    }
>>> +                }
>>> +
>>> +                /* Jump next device. */
>>> +                if ( is_conflict )
>>> +                {
>>> +                    is_conflict = false;
>>> +                    break;
>>> +                }
>>
>>
>> This conditional is still inside the memory_map loop; you want it one
>> loop futher out, in the bar loop, don't you?
>
>
> Here what I intended to do is if one of all bars specific to one given
> device already conflicts with RDM, its not necessary to continue check other
> remaining bars of this device and other RDM regions, we just disable this
> device simply then check next device.

I know what you're trying to do; what I'm saying is I don't think it
does what you want it to do.

You have loops nested 3 deep:
1. for each dev
  2.  for each bar
    3. for each memory range

This conditional is in loop 3; you want it to be in loop 2.

(In fact, when you set is_conflict, you then break out of loop 3 back
into loop 2; so this code will never actually be run.)

 >> Also, if you declare is_conflict inside the devfn loop, rather than in
>> the main function, then you don't need this "is_conflict=false" here.
>>
>> It might also be more sensible to use a goto instead; but this is one
>
>
> This can work for me so it may be as follows:
>
>     for ( devfn = 0; devfn < 256; devfn++ )
>     {
>  check_next_device:
>         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
>         device_id = pci_readw(devfn, PCI_DEVICE_ID);
>         if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
>             continue;
>     ...
>                     if ( check_overlap(bar_data , bar_sz,
>                                    reserved_start, reserved_size) )
>                     {
>                         ...
>                         /* Jump next device. */
>                         devfn++;
>                         goto check_next_device;
>                     }

I'm not a fan of hard-coding the loop continuing condition like this;
if I were going to do a goto, I'd want to go to the end of the loop.

Anyway, the code is OK as it is; I'd rather spend time working on
something that's more of a blocker.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 11:47       ` George Dunlap
@ 2015-07-16 13:12         ` Chen, Tiejun
  2015-07-16 14:29           ` George Dunlap
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 13:12 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

>> +    /*
>> +     * And then we also need to adjust highmem.
>> +     */
>> +    if ( add_high_mem )
>> +    {
>> +        for ( i = 0; i < nr; i++ )
>> +        {
>> +            if ( e820[i].type == E820_RAM &&
>> +                 e820[i].addr == (1ull << 32))
>> +            {
>> +                e820[i].size += add_high_mem;
>> +                add_high_mem = 0;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +
>> +    /* Or this is just populated by hvmloader itself. */
>
> This should probably say something like:
>
> "If there was no highmem region, we need to create one."

Okay, "If there was no highmem entry, we need to create one."

>
>> +    if ( add_high_mem )
>> +    {
>> +        /*
>> +         * hvmloader should always update hvm_info->high_mem_pgend
>> +         * when it relocates RAM anywhere.
>> +         */
>> +        BUG_ON( !hvm_info->high_mem_pgend );
>> +
>>           e820[nr].addr = ((uint64_t)1 << 32);
>>           e820[nr].size =
>>               ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
>
> In theory add_high_mem and hvm_info->high_mem_pgend << PAGE_SHIFT -
> 4GiB is the same, but it seems like asking for trouble to assume so

No, its not true in the first if( add_high_mem ) conditional.

Before we enter hvmloader, there are two cases:

#1. hvm_info->high_mem_pgend == 0

So we wouldn't have a highmem entry in e820. But hvmloader may relocate 
RAM upward highmem (add_high_mem) to get sufficient mmio, so 
hvm_info->high_mem_pgend is expanded somewhere (4GiB + add_high_mem).

Then we would fall into the second if( add_high_mem ) conditional.

#2. hvm_info->high_mem_pgend != 0

We always walk into the first if( add_high_mem ) conditional. But here 
"add_high_mem" just represents that highmem section expanded by 
hvmloader, its really not the whole higmem:(hvm_info->high_mem_pgend << 
PAGE_SHIFT - 4GiB).

Thanks
Tiejun

> without checking.
>
> Perhaps in the first if( add_high_mem ) conditional, you can
> BUG_ON(add_high_mem != ((hvm_info->high_mem_pgend << PAGE_SHIFT) -
> (ull1 << 32))) ?
>
> Other than that, this looks good, thanks.
>
>   -George
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 13:02           ` George Dunlap
@ 2015-07-16 13:21             ` Chen, Tiejun
  2015-07-16 13:32               ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 13:21 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

>> Here what I intended to do is if one of all bars specific to one given
>> device already conflicts with RDM, its not necessary to continue check other
>> remaining bars of this device and other RDM regions, we just disable this
>> device simply then check next device.
>
> I know what you're trying to do; what I'm saying is I don't think it
> does what you want it to do.
>
> You have loops nested 3 deep:
> 1. for each dev
>    2.  for each bar
>      3. for each memory range
>
> This conditional is in loop 3; you want it to be in loop 2.
>
> (In fact, when you set is_conflict, you then break out of loop 3 back
> into loop 2; so this code will never actually be run.)

Sorry I should make this clear last time.

I mean I already knew what you were saying is right at this point so I 
tried to use goto to fix this bug.

>
>   >> Also, if you declare is_conflict inside the devfn loop, rather than in
>>> the main function, then you don't need this "is_conflict=false" here.
>>>
>>> It might also be more sensible to use a goto instead; but this is one
>>
>>

[snip]

> I'm not a fan of hard-coding the loop continuing condition like this;
> if I were going to do a goto, I'd want to go to the end of the loop.
>

I guess something like this,

			...
                         pci_writew(devfn, PCI_COMMAND, ~cmd);
                         /* Jump next device. */
                         goto check_next_device;
                     }
                 }
             }
         }
  check_next_device:
     }
}

> Anyway, the code is OK as it is; I'd rather spend time working on
> something that's more of a blocker.
>

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 13:21             ` Chen, Tiejun
@ 2015-07-16 13:32               ` Jan Beulich
  2015-07-16 13:48                 ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-16 13:32 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

>>> On 16.07.15 at 15:21, <tiejun.chen@intel.com> wrote:
> I guess something like this,
> 
> 			...
>                          pci_writew(devfn, PCI_COMMAND, ~cmd);
>                          /* Jump next device. */
>                          goto check_next_device;
>                      }
>                  }
>              }
>          }
>   check_next_device:
>      }
> }

Except that this isn't valid C (no statement following the label). I can
accept goto-s for some error handling cases where the alternatives
might be considered even more ugly than using goto. But the way
this or your original proposal look, I'd rather not have goto-s used
like this.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 13:32               ` Jan Beulich
@ 2015-07-16 13:48                 ` Chen, Tiejun
  2015-07-16 14:54                   ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 13:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

  > Except that this isn't valid C (no statement following the label). I can
> accept goto-s for some error handling cases where the alternatives
> might be considered even more ugly than using goto. But the way
> this or your original proposal look, I'd rather not have goto-s used
> like this.
>

What about this?

+    bool is_conflict = false;

      for ( devfn = 0; devfn < 256; devfn++ )
      {
@@ -60,7 +61,7 @@ static void disable_conflicting_devices(void)
              continue;

          /* Check all bars */
-        for ( bar = 0; bar < 7; bar++ )
+        for ( bar = 0; bar < 7 && !is_conflict; bar++ )
          {
              bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
              if ( bar == 6 )
@@ -89,7 +90,7 @@ static void disable_conflicting_devices(void)
              bar_sz = pci_readl(devfn, bar_reg);
              bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;

-            for ( i = 0; i < memory_map.nr_map ; i++ )
+            for ( i = 0; i < memory_map.nr_map && !is_conflict; i++ )
              {
                  if ( memory_map.map[i].type == E820_RESERVED )
                  {
@@ -105,13 +106,13 @@ static void disable_conflicting_devices(void)
                                 devfn>>3, devfn&7, bar_reg, bar_data);
                          cmd = pci_readw(devfn, PCI_COMMAND);
                          pci_writew(devfn, PCI_COMMAND, ~cmd);
-                        /* Jump next device. */
-                        goto check_next_device;
+                        /* So need to jump next device. */
+                        is_conflict = true;
                      }
                  }
              }
          }
- check_next_device:
+        is_conflict = false;
      }
  }

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 13:12         ` Chen, Tiejun
@ 2015-07-16 14:29           ` George Dunlap
  2015-07-16 15:04             ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 14:29 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 2:12 PM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>>> +    if ( add_high_mem )
>>> +    {
>>> +        /*
>>> +         * hvmloader should always update hvm_info->high_mem_pgend
>>> +         * when it relocates RAM anywhere.
>>> +         */
>>> +        BUG_ON( !hvm_info->high_mem_pgend );
>>> +
>>>           e820[nr].addr = ((uint64_t)1 << 32);
>>>           e820[nr].size =
>>>               ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) -
>>> e820[nr].addr;
>>
>>
>> In theory add_high_mem and hvm_info->high_mem_pgend << PAGE_SHIFT -
>> 4GiB is the same, but it seems like asking for trouble to assume so
>
>
> No, its not true in the first if( add_high_mem ) conditional.
>
> Before we enter hvmloader, there are two cases:
>
> #1. hvm_info->high_mem_pgend == 0
>
> So we wouldn't have a highmem entry in e820. But hvmloader may relocate RAM
> upward highmem (add_high_mem) to get sufficient mmio, so
> hvm_info->high_mem_pgend is expanded somewhere (4GiB + add_high_mem).
>
> Then we would fall into the second if( add_high_mem ) conditional.
>
> #2. hvm_info->high_mem_pgend != 0
>
> We always walk into the first if( add_high_mem ) conditional. But here
> "add_high_mem" just represents that highmem section expanded by hvmloader,
> its really not the whole higmem:(hvm_info->high_mem_pgend << PAGE_SHIFT -
> 4GiB).

Yes, sorry, add_high_mem will be the size of memory *relocated*, not
the actual end of it (unless, as you say, the original highmem region
didn't exist).

What I really meant was that either way, after adjusting the highmem
region in the e820, the end of that region should correspond to
hvm_info->high_mem_pgend.

What about something like this?
---
    /*
     * And then we also need to adjust highmem.
     */
    if ( add_high_mem )
    {
        /*
         * Modify the existing highmem region if it exists
         */
        for ( i = 0; i < nr; i++ )
        {
            if ( e820[i].type == E820_RAM &&
                 e820[i].addr == (1ull << 32))
            {
                e820[i].size += add_high_mem;
                break;
            }
        }

        /*
         * If we didn't find a highmem region, make one
         */
        if ( i == nr )
        {
            e820[nr].addr = ((uint64_t)1 << 32);
            e820[nr].size = e820[nr].addr + add_high_mem;
            e820[nr].type = E820_RAM;
            nr++;
        }

        /*
         * Either way, at this point i points to the entry containing
         * highmem.  Compare it to what's in hvm_info as a sanity
         * check.
         */
        BUG_ON(e820[i].addr+e820[i].size !=
               ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT));
    }

--

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 13:48                 ` Chen, Tiejun
@ 2015-07-16 14:54                   ` Jan Beulich
  2015-07-16 15:20                     ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-16 14:54 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

  >>> On 16.07.15 at 15:48, <tiejun.chen@intel.com> wrote:
>>  Except that this isn't valid C (no statement following the label). I can
>> accept goto-s for some error handling cases where the alternatives
>> might be considered even more ugly than using goto. But the way
>> this or your original proposal look, I'd rather not have goto-s used
>> like this.
>>
> 
> What about this?

Looks reasonable (but don't forget that I continue to be unconvinced
that the patch as a whole makes sense).

Jan

> +    bool is_conflict = false;
> 
>       for ( devfn = 0; devfn < 256; devfn++ )
>       {
> @@ -60,7 +61,7 @@ static void disable_conflicting_devices(void)
>               continue;
> 
>           /* Check all bars */
> -        for ( bar = 0; bar < 7; bar++ )
> +        for ( bar = 0; bar < 7 && !is_conflict; bar++ )
>           {
>               bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
>               if ( bar == 6 )
> @@ -89,7 +90,7 @@ static void disable_conflicting_devices(void)
>               bar_sz = pci_readl(devfn, bar_reg);
>               bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;
> 
> -            for ( i = 0; i < memory_map.nr_map ; i++ )
> +            for ( i = 0; i < memory_map.nr_map && !is_conflict; i++ )
>               {
>                   if ( memory_map.map[i].type == E820_RESERVED )
>                   {
> @@ -105,13 +106,13 @@ static void disable_conflicting_devices(void)
>                                  devfn>>3, devfn&7, bar_reg, bar_data);
>                           cmd = pci_readw(devfn, PCI_COMMAND);
>                           pci_writew(devfn, PCI_COMMAND, ~cmd);
> -                        /* Jump next device. */
> -                        goto check_next_device;
> +                        /* So need to jump next device. */
> +                        is_conflict = true;
>                       }
>                   }
>               }
>           }
> - check_next_device:
> +        is_conflict = false;
>       }
>   }
> 
> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 14:29           ` George Dunlap
@ 2015-07-16 15:04             ` Chen, Tiejun
  2015-07-16 15:16               ` George Dunlap
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 15:04 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

> Yes, sorry, add_high_mem will be the size of memory *relocated*, not
> the actual end of it (unless, as you say, the original highmem region
> didn't exist).
>
> What I really meant was that either way, after adjusting the highmem
> region in the e820, the end of that region should correspond to
> hvm_info->high_mem_pgend.
>
> What about something like this?
> ---
>      /*
>       * And then we also need to adjust highmem.
>       */
>      if ( add_high_mem )
>      {
>          /*
>           * Modify the existing highmem region if it exists
>           */
>          for ( i = 0; i < nr; i++ )
>          {
>              if ( e820[i].type == E820_RAM &&
>                   e820[i].addr == (1ull << 32))
>              {
>                  e820[i].size += add_high_mem;
>                  break;
>              }
>          }
>
>          /*
>           * If we didn't find a highmem region, make one
>           */
>          if ( i == nr )
>          {
>              e820[nr].addr = ((uint64_t)1 << 32);
>              e820[nr].size = e820[nr].addr + add_high_mem;
>              e820[nr].type = E820_RAM;
>              nr++;
>          }
>
>          /*
>           * Either way, at this point i points to the entry containing
>           * highmem.  Compare it to what's in hvm_info as a sanity
>           * check.
>           */
>          BUG_ON(e820[i].addr+e820[i].size !=
>                 ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT));
>      }
>

Looks really better.

I just introduce a little change based on yours, and I post this as a whole,

diff --git a/tools/firmware/hvmloader/e820.c 
b/tools/firmware/hvmloader/e820.c
index 7a414ab..8c9b01f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
                       unsigned int lowmem_reserved_base,
                       unsigned int bios_image_base)
  {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << 
PAGE_SHIFT;
+    uint64_t add_high_mem = 0;

      if ( !lowmem_reserved_base )
              lowmem_reserved_base = 0xA0000;
@@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
      e820[nr].type = E820_RESERVED;
      nr++;

-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - 
e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
      /*
       * Explicitly reserve space for special pages.
       * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
          nr++;
      }

-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
      {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - 
e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
          nr++;
      }

+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));
+
+    /*
+     * Its possible to relocate RAM to allocate sufficient MMIO previously
+     * so low_mem_pgend would be changed over there. And here memory_map[]
+     * records the original low/high memory, so if low_mem_end is less than
+     * the original we need to revise low/high memory range in e820.
+     */
+    for ( i = 0; i < nr; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+        }
+    }
+
+    /*
+     * And then we also need to adjust highmem.
+     */
+    if ( add_high_mem )
+    {
+        /* Modify the existing highmem region if it exists. */
+        for ( i = 0; i < nr; i++ )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr == ((uint64_t)1 << 32))
+            {
+                e820[i].size += add_high_mem;
+                break;
+            }
+        }
+
+        /* If there was no highmem region, just create one. */
+        if ( i == nr )
+        {
+            e820[nr].addr = ((uint64_t)1 << 32);
+            e820[nr].size = high_mem_end  - e820[nr].addr;
+            e820[nr].type = E820_RAM;
+            nr++;
+        }
+
+        /* A sanity check if high memory is broken. */
+        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
+    }
+
+    /* Finally we need to sort all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
      return nr;
  }

Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 15:04             ` Chen, Tiejun
@ 2015-07-16 15:16               ` George Dunlap
  2015-07-16 15:29                 ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 15:16 UTC (permalink / raw)
  To: Chen, Tiejun, George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On 07/16/2015 04:04 PM, Chen, Tiejun wrote:
>> Yes, sorry, add_high_mem will be the size of memory *relocated*, not
>> the actual end of it (unless, as you say, the original highmem region
>> didn't exist).
>>
>> What I really meant was that either way, after adjusting the highmem
>> region in the e820, the end of that region should correspond to
>> hvm_info->high_mem_pgend.
>>
>> What about something like this?
>> ---
>>      /*
>>       * And then we also need to adjust highmem.
>>       */
>>      if ( add_high_mem )
>>      {
>>          /*
>>           * Modify the existing highmem region if it exists
>>           */
>>          for ( i = 0; i < nr; i++ )
>>          {
>>              if ( e820[i].type == E820_RAM &&
>>                   e820[i].addr == (1ull << 32))
>>              {
>>                  e820[i].size += add_high_mem;
>>                  break;
>>              }
>>          }
>>
>>          /*
>>           * If we didn't find a highmem region, make one
>>           */
>>          if ( i == nr )
>>          {
>>              e820[nr].addr = ((uint64_t)1 << 32);
>>              e820[nr].size = e820[nr].addr + add_high_mem;
>>              e820[nr].type = E820_RAM;
>>              nr++;
>>          }
>>
>>          /*
>>           * Either way, at this point i points to the entry containing
>>           * highmem.  Compare it to what's in hvm_info as a sanity
>>           * check.
>>           */
>>          BUG_ON(e820[i].addr+e820[i].size !=
>>                 ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT));
>>      }
>>
> 
> Looks really better.
> 
> I just introduce a little change based on yours, and I post this as a
> whole,
> 
> diff --git a/tools/firmware/hvmloader/e820.c
> b/tools/firmware/hvmloader/e820.c
> index 7a414ab..8c9b01f 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
> +    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend <<
> PAGE_SHIFT;
> +    uint64_t add_high_mem = 0;
> 
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
>      e820[nr].type = E820_RESERVED;
>      nr++;
> 
> -    /* Low RAM goes here. Reserve space for special pages. */
> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
> -    e820[nr].addr = 0x100000;
> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) -
> e820[nr].addr;
> -    e820[nr].type = E820_RAM;
> -    nr++;
> -
>      /*
>       * Explicitly reserve space for special pages.
>       * This space starts at RESERVED_MEMBASE an extends to cover various
> @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
> 
> -
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Construct E820 table according to recorded memory map.
> +     *
> +     * The memory map created by toolstack may include,
> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. Reserved regions if they exist
> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
>      {
> -        e820[nr].addr = ((uint64_t)1 << 32);
> -        e820[nr].size =
> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) -
> e820[nr].addr;
> -        e820[nr].type = E820_RAM;
> +        e820[nr] = memory_map.map[i];
>          nr++;
>      }
> 
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_end < (2u << 20));
> +
> +    /*
> +     * Its possible to relocate RAM to allocate sufficient MMIO previously
> +     * so low_mem_pgend would be changed over there. And here memory_map[]
> +     * records the original low/high memory, so if low_mem_end is less
> than
> +     * the original we need to revise low/high memory range in e820.
> +     */
> +    for ( i = 0; i < nr; i++ )
> +    {
> +        uint64_t end = e820[i].addr + e820[i].size;
> +        if ( e820[i].type == E820_RAM &&
> +             low_mem_end > e820[i].addr && low_mem_end < end )
> +        {
> +            add_high_mem = end - low_mem_end;
> +            e820[i].size = low_mem_end - e820[i].addr;
> +        }
> +    }
> +
> +    /*
> +     * And then we also need to adjust highmem.
> +     */
> +    if ( add_high_mem )
> +    {
> +        /* Modify the existing highmem region if it exists. */
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            if ( e820[i].type == E820_RAM &&
> +                 e820[i].addr == ((uint64_t)1 << 32))
> +            {
> +                e820[i].size += add_high_mem;
> +                break;
> +            }
> +        }
> +
> +        /* If there was no highmem region, just create one. */
> +        if ( i == nr )
> +        {
> +            e820[nr].addr = ((uint64_t)1 << 32);
> +            e820[nr].size = high_mem_end  - e820[nr].addr;
> +            e820[nr].type = E820_RAM;
> +            nr++;
> +        }
> +
> +        /* A sanity check if high memory is broken. */
> +        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);

The reason I wrote it the way I did was so that we would cross-check our
lowmem adjustments (via add_high_mem) with the value in hvm_info in
*both cases*.

In the code above, you'll get the sanity check if we modify an existing
e820 entry; but if we create a new entry, then we don't check to make
sure that the amount we removed from the lowmem entry equals the amount
we added to the highmem entry.

By all means, calculate high_mem_end so it's easier to read.  But then,
when creating a new region, set e820[nr].size = add_high_mem, so that
the BUG_ON() that follows actually checks something useful.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 14:54                   ` Jan Beulich
@ 2015-07-16 15:20                     ` Chen, Tiejun
  2015-07-16 15:39                       ` George Dunlap
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 15:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

>> What about this?
>
> Looks reasonable (but don't forget that I continue to be unconvinced
> that the patch as a whole makes sense).

Yes, I always keep this in my mind as I mentioned in patch #00. Any risk 
you're still concerning? Is it that case if guest OS force enabling 
these devices again? IMO, at this point there are two cases:

#1. Without passing through a RMRR device

Those emulated devices don't create 1:1 mapping so its safe, right?

#2. With passing through a RMRR device

This just probably cause these associated devices not to work well, but 
still don't bring any impact to other Domains, right? I mean this isn't 
going to worsen the preexisting situation.

If I'm wrong please correct me.

Thanks
Tiejun

>
> Jan
>
>> +    bool is_conflict = false;
>>
>>        for ( devfn = 0; devfn < 256; devfn++ )
>>        {
>> @@ -60,7 +61,7 @@ static void disable_conflicting_devices(void)
>>                continue;
>>
>>            /* Check all bars */
>> -        for ( bar = 0; bar < 7; bar++ )
>> +        for ( bar = 0; bar < 7 && !is_conflict; bar++ )
>>            {
>>                bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
>>                if ( bar == 6 )
>> @@ -89,7 +90,7 @@ static void disable_conflicting_devices(void)
>>                bar_sz = pci_readl(devfn, bar_reg);
>>                bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;
>>
>> -            for ( i = 0; i < memory_map.nr_map ; i++ )
>> +            for ( i = 0; i < memory_map.nr_map && !is_conflict; i++ )
>>                {
>>                    if ( memory_map.map[i].type == E820_RESERVED )
>>                    {
>> @@ -105,13 +106,13 @@ static void disable_conflicting_devices(void)
>>                                   devfn>>3, devfn&7, bar_reg, bar_data);
>>                            cmd = pci_readw(devfn, PCI_COMMAND);
>>                            pci_writew(devfn, PCI_COMMAND, ~cmd);
>> -                        /* Jump next device. */
>> -                        goto check_next_device;
>> +                        /* So need to jump next device. */
>> +                        is_conflict = true;
>>                        }
>>                    }
>>                }
>>            }
>> - check_next_device:
>> +        is_conflict = false;
>>        }
>>    }
>>
>> Thanks
>> Tiejun
>
>
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 15:16               ` George Dunlap
@ 2015-07-16 15:29                 ` Chen, Tiejun
  2015-07-16 15:33                   ` George Dunlap
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 15:29 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu



On 2015/7/16 23:16, George Dunlap wrote:
> On 07/16/2015 04:04 PM, Chen, Tiejun wrote:
>>> Yes, sorry, add_high_mem will be the size of memory *relocated*, not
>>> the actual end of it (unless, as you say, the original highmem region
>>> didn't exist).
>>>
>>> What I really meant was that either way, after adjusting the highmem
>>> region in the e820, the end of that region should correspond to
>>> hvm_info->high_mem_pgend.
>>>
>>> What about something like this?
>>> ---
>>>       /*
>>>        * And then we also need to adjust highmem.
>>>        */
>>>       if ( add_high_mem )
>>>       {
>>>           /*
>>>            * Modify the existing highmem region if it exists
>>>            */
>>>           for ( i = 0; i < nr; i++ )
>>>           {
>>>               if ( e820[i].type == E820_RAM &&
>>>                    e820[i].addr == (1ull << 32))
>>>               {
>>>                   e820[i].size += add_high_mem;
>>>                   break;
>>>               }
>>>           }
>>>
>>>           /*
>>>            * If we didn't find a highmem region, make one
>>>            */
>>>           if ( i == nr )
>>>           {
>>>               e820[nr].addr = ((uint64_t)1 << 32);
>>>               e820[nr].size = e820[nr].addr + add_high_mem;
>>>               e820[nr].type = E820_RAM;
>>>               nr++;
>>>           }
>>>
>>>           /*
>>>            * Either way, at this point i points to the entry containing
>>>            * highmem.  Compare it to what's in hvm_info as a sanity
>>>            * check.
>>>            */
>>>           BUG_ON(e820[i].addr+e820[i].size !=
>>>                  ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT));
>>>       }
>>>
>>
>> Looks really better.
>>
>> I just introduce a little change based on yours, and I post this as a
>> whole,
>>
>> diff --git a/tools/firmware/hvmloader/e820.c
>> b/tools/firmware/hvmloader/e820.c
>> index 7a414ab..8c9b01f 100644
>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
>>                        unsigned int lowmem_reserved_base,
>>                        unsigned int bios_image_base)
>>   {
>> -    unsigned int nr = 0;
>> +    unsigned int nr = 0, i, j;
>> +    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>> +    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend <<
>> PAGE_SHIFT;
>> +    uint64_t add_high_mem = 0;
>>
>>       if ( !lowmem_reserved_base )
>>               lowmem_reserved_base = 0xA0000;
>> @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
>>       e820[nr].type = E820_RESERVED;
>>       nr++;
>>
>> -    /* Low RAM goes here. Reserve space for special pages. */
>> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
>> -    e820[nr].addr = 0x100000;
>> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) -
>> e820[nr].addr;
>> -    e820[nr].type = E820_RAM;
>> -    nr++;
>> -
>>       /*
>>        * Explicitly reserve space for special pages.
>>        * This space starts at RESERVED_MEMBASE an extends to cover various
>> @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
>>           nr++;
>>       }
>>
>> -
>> -    if ( hvm_info->high_mem_pgend )
>> +    /*
>> +     * Construct E820 table according to recorded memory map.
>> +     *
>> +     * The memory map created by toolstack may include,
>> +     *
>> +     * #1. Low memory region
>> +     *
>> +     * Low RAM starts at least from 1M to make sure all standard regions
>> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> +     * have enough space.
>> +     *
>> +     * #2. Reserved regions if they exist
>> +     *
>> +     * #3. High memory region if it exists
>> +     */
>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>       {
>> -        e820[nr].addr = ((uint64_t)1 << 32);
>> -        e820[nr].size =
>> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) -
>> e820[nr].addr;
>> -        e820[nr].type = E820_RAM;
>> +        e820[nr] = memory_map.map[i];
>>           nr++;
>>       }
>>
>> +    /* Low RAM goes here. Reserve space for special pages. */
>> +    BUG_ON(low_mem_end < (2u << 20));
>> +
>> +    /*
>> +     * Its possible to relocate RAM to allocate sufficient MMIO previously
>> +     * so low_mem_pgend would be changed over there. And here memory_map[]
>> +     * records the original low/high memory, so if low_mem_end is less
>> than
>> +     * the original we need to revise low/high memory range in e820.
>> +     */
>> +    for ( i = 0; i < nr; i++ )
>> +    {
>> +        uint64_t end = e820[i].addr + e820[i].size;
>> +        if ( e820[i].type == E820_RAM &&
>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>> +        {
>> +            add_high_mem = end - low_mem_end;
>> +            e820[i].size = low_mem_end - e820[i].addr;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * And then we also need to adjust highmem.
>> +     */
>> +    if ( add_high_mem )
>> +    {
>> +        /* Modify the existing highmem region if it exists. */
>> +        for ( i = 0; i < nr; i++ )
>> +        {
>> +            if ( e820[i].type == E820_RAM &&
>> +                 e820[i].addr == ((uint64_t)1 << 32))
>> +            {
>> +                e820[i].size += add_high_mem;
>> +                break;
>> +            }
>> +        }
>> +
>> +        /* If there was no highmem region, just create one. */
>> +        if ( i == nr )
>> +        {
>> +            e820[nr].addr = ((uint64_t)1 << 32);
>> +            e820[nr].size = high_mem_end  - e820[nr].addr;
>> +            e820[nr].type = E820_RAM;
>> +            nr++;
>> +        }
>> +
>> +        /* A sanity check if high memory is broken. */
>> +        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
>
> The reason I wrote it the way I did was so that we would cross-check our
> lowmem adjustments (via add_high_mem) with the value in hvm_info in
> *both cases*.
>
> In the code above, you'll get the sanity check if we modify an existing
> e820 entry; but if we create a new entry, then we don't check to make
> sure that the amount we removed from the lowmem entry equals the amount
> we added to the highmem entry.

Are you saying the following two cases are not same?

uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT;
BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
vs.
BUG_ON(e820[i].addr+e820[i].size != ((uint64_t)hvm_info->high_mem_pgend 
<< PAGE_SHIFT));

Why? Note hvm_info->high_mem_pgend don't change while build e820 table.

Honestly I didn't try to change that point but maybe I'm missing something?

Thanks
Tiejun

>
> By all means, calculate high_mem_end so it's easier to read.  But then,
> when creating a new region, set e820[nr].size = add_high_mem, so that
> the BUG_ON() that follows actually checks something useful.
>
>   -George
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 15:29                 ` Chen, Tiejun
@ 2015-07-16 15:33                   ` George Dunlap
  2015-07-16 15:42                     ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 15:33 UTC (permalink / raw)
  To: Chen, Tiejun, George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On 07/16/2015 04:29 PM, Chen, Tiejun wrote:
> 
> 
> On 2015/7/16 23:16, George Dunlap wrote:
>> On 07/16/2015 04:04 PM, Chen, Tiejun wrote:
>>>> Yes, sorry, add_high_mem will be the size of memory *relocated*, not
>>>> the actual end of it (unless, as you say, the original highmem region
>>>> didn't exist).
>>>>
>>>> What I really meant was that either way, after adjusting the highmem
>>>> region in the e820, the end of that region should correspond to
>>>> hvm_info->high_mem_pgend.
>>>>
>>>> What about something like this?
>>>> ---
>>>>       /*
>>>>        * And then we also need to adjust highmem.
>>>>        */
>>>>       if ( add_high_mem )
>>>>       {
>>>>           /*
>>>>            * Modify the existing highmem region if it exists
>>>>            */
>>>>           for ( i = 0; i < nr; i++ )
>>>>           {
>>>>               if ( e820[i].type == E820_RAM &&
>>>>                    e820[i].addr == (1ull << 32))
>>>>               {
>>>>                   e820[i].size += add_high_mem;
>>>>                   break;
>>>>               }
>>>>           }
>>>>
>>>>           /*
>>>>            * If we didn't find a highmem region, make one
>>>>            */
>>>>           if ( i == nr )
>>>>           {
>>>>               e820[nr].addr = ((uint64_t)1 << 32);
>>>>               e820[nr].size = e820[nr].addr + add_high_mem;
>>>>               e820[nr].type = E820_RAM;
>>>>               nr++;
>>>>           }
>>>>
>>>>           /*
>>>>            * Either way, at this point i points to the entry containing
>>>>            * highmem.  Compare it to what's in hvm_info as a sanity
>>>>            * check.
>>>>            */
>>>>           BUG_ON(e820[i].addr+e820[i].size !=
>>>>                  ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT));
>>>>       }
>>>>
>>>
>>> Looks really better.
>>>
>>> I just introduce a little change based on yours, and I post this as a
>>> whole,
>>>
>>> diff --git a/tools/firmware/hvmloader/e820.c
>>> b/tools/firmware/hvmloader/e820.c
>>> index 7a414ab..8c9b01f 100644
>>> --- a/tools/firmware/hvmloader/e820.c
>>> +++ b/tools/firmware/hvmloader/e820.c
>>> @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
>>>                        unsigned int lowmem_reserved_base,
>>>                        unsigned int bios_image_base)
>>>   {
>>> -    unsigned int nr = 0;
>>> +    unsigned int nr = 0, i, j;
>>> +    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>>> +    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend <<
>>> PAGE_SHIFT;
>>> +    uint64_t add_high_mem = 0;
>>>
>>>       if ( !lowmem_reserved_base )
>>>               lowmem_reserved_base = 0xA0000;
>>> @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
>>>       e820[nr].type = E820_RESERVED;
>>>       nr++;
>>>
>>> -    /* Low RAM goes here. Reserve space for special pages. */
>>> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
>>> -    e820[nr].addr = 0x100000;
>>> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) -
>>> e820[nr].addr;
>>> -    e820[nr].type = E820_RAM;
>>> -    nr++;
>>> -
>>>       /*
>>>        * Explicitly reserve space for special pages.
>>>        * This space starts at RESERVED_MEMBASE an extends to cover
>>> various
>>> @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
>>>           nr++;
>>>       }
>>>
>>> -
>>> -    if ( hvm_info->high_mem_pgend )
>>> +    /*
>>> +     * Construct E820 table according to recorded memory map.
>>> +     *
>>> +     * The memory map created by toolstack may include,
>>> +     *
>>> +     * #1. Low memory region
>>> +     *
>>> +     * Low RAM starts at least from 1M to make sure all standard
>>> regions
>>> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and
>>> vgabios,
>>> +     * have enough space.
>>> +     *
>>> +     * #2. Reserved regions if they exist
>>> +     *
>>> +     * #3. High memory region if it exists
>>> +     */
>>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>>       {
>>> -        e820[nr].addr = ((uint64_t)1 << 32);
>>> -        e820[nr].size =
>>> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) -
>>> e820[nr].addr;
>>> -        e820[nr].type = E820_RAM;
>>> +        e820[nr] = memory_map.map[i];
>>>           nr++;
>>>       }
>>>
>>> +    /* Low RAM goes here. Reserve space for special pages. */
>>> +    BUG_ON(low_mem_end < (2u << 20));
>>> +
>>> +    /*
>>> +     * Its possible to relocate RAM to allocate sufficient MMIO
>>> previously
>>> +     * so low_mem_pgend would be changed over there. And here
>>> memory_map[]
>>> +     * records the original low/high memory, so if low_mem_end is less
>>> than
>>> +     * the original we need to revise low/high memory range in e820.
>>> +     */
>>> +    for ( i = 0; i < nr; i++ )
>>> +    {
>>> +        uint64_t end = e820[i].addr + e820[i].size;
>>> +        if ( e820[i].type == E820_RAM &&
>>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>>> +        {
>>> +            add_high_mem = end - low_mem_end;
>>> +            e820[i].size = low_mem_end - e820[i].addr;
>>> +        }
>>> +    }
>>> +
>>> +    /*
>>> +     * And then we also need to adjust highmem.
>>> +     */
>>> +    if ( add_high_mem )
>>> +    {
>>> +        /* Modify the existing highmem region if it exists. */
>>> +        for ( i = 0; i < nr; i++ )
>>> +        {
>>> +            if ( e820[i].type == E820_RAM &&
>>> +                 e820[i].addr == ((uint64_t)1 << 32))
>>> +            {
>>> +                e820[i].size += add_high_mem;
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>> +        /* If there was no highmem region, just create one. */
>>> +        if ( i == nr )
>>> +        {
>>> +            e820[nr].addr = ((uint64_t)1 << 32);
>>> +            e820[nr].size = high_mem_end  - e820[nr].addr;
>>> +            e820[nr].type = E820_RAM;
>>> +            nr++;
>>> +        }
>>> +
>>> +        /* A sanity check if high memory is broken. */
>>> +        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
>>
>> The reason I wrote it the way I did was so that we would cross-check our
>> lowmem adjustments (via add_high_mem) with the value in hvm_info in
>> *both cases*.
>>
>> In the code above, you'll get the sanity check if we modify an existing
>> e820 entry; but if we create a new entry, then we don't check to make
>> sure that the amount we removed from the lowmem entry equals the amount
>> we added to the highmem entry.
> 
> Are you saying the following two cases are not same?
> 
> uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT;
> BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
> vs.
> BUG_ON(e820[i].addr+e820[i].size != ((uint64_t)hvm_info->high_mem_pgend
> << PAGE_SHIFT));
> 
> Why? Note hvm_info->high_mem_pgend don't change while build e820 table.
> 
> Honestly I didn't try to change that point but maybe I'm missing something?

Yes, you are missing something. :-)  I told you exactly what I wanted
changed and what I said could remain the same:

>> By all means, calculate high_mem_end so it's easier to read.  But then,
>> when creating a new region, set e820[nr].size = add_high_mem, so that
>> the BUG_ON() that follows actually checks something useful.

Just to be clear, I want the second if() statement to look like this:

>>> +        if ( i == nr )
>>> +        {
>>> +            e820[nr].addr = ((uint64_t)1 << 32);
>>> +            e820[nr].size = add_high_mem;
>>> +            e820[nr].type = E820_RAM;
>>> +            nr++;
>>> +        }

Think about why and maybe that will help you understand what I'm talking
about.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 15:20                     ` Chen, Tiejun
@ 2015-07-16 15:39                       ` George Dunlap
  2015-07-16 16:08                         ` Chen, Tiejun
  2015-07-16 16:18                         ` George Dunlap
  0 siblings, 2 replies; 83+ messages in thread
From: George Dunlap @ 2015-07-16 15:39 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>> What about this?
>>
>> Looks reasonable (but don't forget that I continue to be unconvinced
>> that the patch as a whole makes sense).
> 
> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
> you're still concerning? Is it that case if guest OS force enabling
> these devices again? IMO, at this point there are two cases:
> 
> #1. Without passing through a RMRR device
> 
> Those emulated devices don't create 1:1 mapping so its safe, right?
> 
> #2. With passing through a RMRR device
> 
> This just probably cause these associated devices not to work well, but
> still don't bring any impact to other Domains, right? I mean this isn't
> going to worsen the preexisting situation.
> 
> If I'm wrong please correct me.

But I think the issue is, without doing *something* about MMIO
collisions, the feature as a whole is sort of pointless.  You can
carefully specify rdm="strategy=host,reserved=strict", but you might
still get devices whose MMIO regions conflict with RMMRs, and there's
nothing you can really do about it.

And although I personally think it might be possible / reasonable to
check in a newly-written, partial MMIO collision avoidance patch, not
everyone might agree.  Even if I were to rewrite and post a patch
myself, they may argue that doing such a complicated re-design after the
feature freeze shouldn't be allowed.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16 15:33                   ` George Dunlap
@ 2015-07-16 15:42                     ` Chen, Tiejun
  0 siblings, 0 replies; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 15:42 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

>> Honestly I didn't try to change that point but maybe I'm missing something?
>
> Yes, you are missing something. :-)  I told you exactly what I wanted
> changed and what I said could remain the same:
>
>>> By all means, calculate high_mem_end so it's easier to read.  But then,
>>> when creating a new region, set e820[nr].size = add_high_mem, so that
>>> the BUG_ON() that follows actually checks something useful.
>
> Just to be clear, I want the second if() statement to look like this:
>
>>>> +        if ( i == nr )
>>>> +        {
>>>> +            e820[nr].addr = ((uint64_t)1 << 32);
>>>> +            e820[nr].size = add_high_mem;

Ahh, when you're replying this, I also see this difference and realize 
what you meant. Sorry to this inconvenience and I'll sync this line into 
my tree :)

Thanks
Tiejun

>>>> +            e820[nr].type = E820_RAM;
>>>> +            nr++;
>>>> +        }
>
> Think about why and maybe that will help you understand what I'm talking
> about.
>
>   -George
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 15:39                       ` George Dunlap
@ 2015-07-16 16:08                         ` Chen, Tiejun
  2015-07-16 16:40                           ` George Dunlap
  2015-07-16 16:18                         ` George Dunlap
  1 sibling, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 16:08 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

On 2015/7/16 23:39, George Dunlap wrote:
> On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>>> What about this?
>>>
>>> Looks reasonable (but don't forget that I continue to be unconvinced
>>> that the patch as a whole makes sense).
>>
>> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
>> you're still concerning? Is it that case if guest OS force enabling
>> these devices again? IMO, at this point there are two cases:
>>
>> #1. Without passing through a RMRR device
>>
>> Those emulated devices don't create 1:1 mapping so its safe, right?
>>
>> #2. With passing through a RMRR device
>>
>> This just probably cause these associated devices not to work well, but
>> still don't bring any impact to other Domains, right? I mean this isn't
>> going to worsen the preexisting situation.
>>
>> If I'm wrong please correct me.
>
> But I think the issue is, without doing *something* about MMIO
> collisions, the feature as a whole is sort of pointless.  You can
> carefully specify rdm="strategy=host,reserved=strict", but you might

I got what your mean. But there's no a good approach to bridge between 
xl and hvmloader to follow this policy. Right now, maybe just one thing 
could be tried like this,

Under hvmloader circumstance,

"strict" -> Still set RDM as E820_RESERVED
"relaxed" -> Set RDM as a new internal E820 flag like E820_HAZARDOUS

Then in the case of MMIO collisions

E820_RESERVED -> BUG() -> Stop VM
E820_HAZARDOUS -> our warning messages + disable devices

I think this can make sure we always take consistent policy in each 
involved cycle.

Thanks
Tiejun

> still get devices whose MMIO regions conflict with RMMRs, and there's
> nothing you can really do about it.
>
> And although I personally think it might be possible / reasonable to
> check in a newly-written, partial MMIO collision avoidance patch, not
> everyone might agree.  Even if I were to rewrite and post a patch
> myself, they may argue that doing such a complicated re-design after the
> feature freeze shouldn't be allowed.
>
>   -George
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 15:39                       ` George Dunlap
  2015-07-16 16:08                         ` Chen, Tiejun
@ 2015-07-16 16:18                         ` George Dunlap
  2015-07-16 16:31                           ` George Dunlap
  2015-07-16 21:15                           ` Chen, Tiejun
  1 sibling, 2 replies; 83+ messages in thread
From: George Dunlap @ 2015-07-16 16:18 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Chen, Tiejun, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 5225 bytes --]

On Thu, Jul 16, 2015 at 4:39 PM, George Dunlap <george.dunlap@citrix.com> wrote:
> On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>>> What about this?
>>>
>>> Looks reasonable (but don't forget that I continue to be unconvinced
>>> that the patch as a whole makes sense).
>>
>> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
>> you're still concerning? Is it that case if guest OS force enabling
>> these devices again? IMO, at this point there are two cases:
>>
>> #1. Without passing through a RMRR device
>>
>> Those emulated devices don't create 1:1 mapping so its safe, right?
>>
>> #2. With passing through a RMRR device
>>
>> This just probably cause these associated devices not to work well, but
>> still don't bring any impact to other Domains, right? I mean this isn't
>> going to worsen the preexisting situation.
>>
>> If I'm wrong please correct me.
>
> But I think the issue is, without doing *something* about MMIO
> collisions, the feature as a whole is sort of pointless.  You can
> carefully specify rdm="strategy=host,reserved=strict", but you might
> still get devices whose MMIO regions conflict with RMMRs, and there's
> nothing you can really do about it.
>
> And although I personally think it might be possible / reasonable to
> check in a newly-written, partial MMIO collision avoidance patch, not
> everyone might agree.  Even if I were to rewrite and post a patch
> myself, they may argue that doing such a complicated re-design after the
> feature freeze shouldn't be allowed.

What about something like this?

 -George

---
 [PATCH] hvmloader/pci: Try to avoid placing BARs in RMRRs

Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
---
THIS WILL NOT COMPILE, as it needs check_overlap_all() to be implemented.

It's just a proof-of-concept for discussion.
---
 tools/firmware/hvmloader/pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..dcb8cd0 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,25 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;

+/* Find the lowest RMRR higher than base */
+int find_next_rmrr(uint32_t base)
+{
+    int next_rmrr=-1;
+    uing64_t min_base = (1ull << 32);
+
+    for ( i = 0; i < memory_map.nr_map ; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED
+             && memory_map.map[i].addr > base
+             && memory_map.map[i].addr < min_base)
+        {
+            next_rmrr = i;
+            min_base = memory_map.map[i].addr;
+        }
+    }
+    return next_rmrr;
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -299,6 +318,15 @@ void pci_setup(void)
                     || (((pci_mem_start << 1) >> PAGE_SHIFT)
                         >= hvm_info->low_mem_pgend)) )
             pci_mem_start <<= 1;
+
+        /*
+         * Try to accomodate RMRRs in our MMIO region on a best-effort basis.
+         * If we have RMRRs in the range, then make pci_mem_start just after
+         * hvm_info->low_mem_pgend.
+         */
+        if ( pci_mem_start > (hvm_info->low_mem_pgend << PAGE_SHIFT) &&
+             check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) )
+            pci_mem_start = (hvm_info->low_mem_pgend + 1 ) << PAGE_SHIFT);
     }

     if ( mmio_total > (pci_mem_end - pci_mem_start) )
@@ -352,6 +380,8 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;

+    next_rmrr = find_next_rmrr(pci_mem_start);
+
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -407,6 +437,18 @@ void pci_setup(void)
         }

         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+
+        /* If we're using mem_resource, check for RMRR conflicts */
+        while ( resource == &mem_resource &&
+                next_rmrr > 0 &&
+                check_overlap(base, bar_sz,
+                              memory_map.map[next_rmrr].addr,
+                              memory_map.map[next_rmrr].size)) {
+            base = memory_map.map[next_rmrr].addr +
memory_map.map[next_rmrr].size;
+            base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+            next_rmrr=find_next_rmrr(base);
+        }
+
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
         base += bar_sz;

[-- Attachment #2: 0001-hvmloader-pci-Try-to-avoid-placing-BARs-in-RMRRs.patch --]
[-- Type: text/x-diff, Size: 3824 bytes --]

From f0c7abdf9a17db9e512fc4427c122afcf386704b Mon Sep 17 00:00:00 2001
From: Tiejun Chen <tiejun.chen@intel.com>
Date: Thu, 16 Jul 2015 14:52:52 +0800
Subject: [PATCH] hvmloader/pci: Try to avoid placing BARs in RMRRs

Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
---
THIS WILL NOT COMPILE, as it needs check_overlap_all() to be implemented.

It's just a proof-of-concept for discussion.
---
 tools/firmware/hvmloader/pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..dcb8cd0 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,25 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/* Find the lowest RMRR higher than base */
+int find_next_rmrr(uint32_t base)
+{
+    int next_rmrr=-1;
+    uing64_t min_base = (1ull << 32);
+
+    for ( i = 0; i < memory_map.nr_map ; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED 
+             && memory_map.map[i].addr > base
+             && memory_map.map[i].addr < min_base) 
+        {
+            next_rmrr = i;
+            min_base = memory_map.map[i].addr;
+        }
+    }
+    return next_rmrr;
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -299,6 +318,15 @@ void pci_setup(void)
                     || (((pci_mem_start << 1) >> PAGE_SHIFT)
                         >= hvm_info->low_mem_pgend)) )
             pci_mem_start <<= 1;
+
+        /* 
+         * Try to accomodate RMRRs in our MMIO region on a best-effort basis.
+         * If we have RMRRs in the range, then make pci_mem_start just after
+         * hvm_info->low_mem_pgend.
+         */
+        if ( pci_mem_start > (hvm_info->low_mem_pgend << PAGE_SHIFT) &&
+             check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) )
+            pci_mem_start = (hvm_info->low_mem_pgend + 1 ) << PAGE_SHIFT);
     }
 
     if ( mmio_total > (pci_mem_end - pci_mem_start) )
@@ -352,6 +380,8 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
+    next_rmrr = find_next_rmrr(pci_mem_start);
+
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -407,6 +437,18 @@ void pci_setup(void)
         }
 
         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+
+        /* If we're using mem_resource, check for RMRR conflicts */
+        while ( resource == &mem_resource &&
+                next_rmrr > 0 &&
+                check_overlap(base, bar_sz, 
+                              memory_map.map[next_rmrr].addr,
+                              memory_map.map[next_rmrr].size)) {
+            base = memory_map.map[next_rmrr].addr + memory_map.map[next_rmrr].size;
+            base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+            next_rmrr=find_next_rmrr(base);
+        }
+
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
         base += bar_sz;
-- 
1.9.1


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 16:18                         ` George Dunlap
@ 2015-07-16 16:31                           ` George Dunlap
  2015-07-16 21:15                           ` Chen, Tiejun
  1 sibling, 0 replies; 83+ messages in thread
From: George Dunlap @ 2015-07-16 16:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Chen, Tiejun, Keir Fraser

On Thu, Jul 16, 2015 at 5:18 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Thu, Jul 16, 2015 at 4:39 PM, George Dunlap <george.dunlap@citrix.com> wrote:
>> On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>>>> What about this?
>>>>
>>>> Looks reasonable (but don't forget that I continue to be unconvinced
>>>> that the patch as a whole makes sense).
>>>
>>> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
>>> you're still concerning? Is it that case if guest OS force enabling
>>> these devices again? IMO, at this point there are two cases:
>>>
>>> #1. Without passing through a RMRR device
>>>
>>> Those emulated devices don't create 1:1 mapping so its safe, right?
>>>
>>> #2. With passing through a RMRR device
>>>
>>> This just probably cause these associated devices not to work well, but
>>> still don't bring any impact to other Domains, right? I mean this isn't
>>> going to worsen the preexisting situation.
>>>
>>> If I'm wrong please correct me.
>>
>> But I think the issue is, without doing *something* about MMIO
>> collisions, the feature as a whole is sort of pointless.  You can
>> carefully specify rdm="strategy=host,reserved=strict", but you might
>> still get devices whose MMIO regions conflict with RMMRs, and there's
>> nothing you can really do about it.
>>
>> And although I personally think it might be possible / reasonable to
>> check in a newly-written, partial MMIO collision avoidance patch, not
>> everyone might agree.  Even if I were to rewrite and post a patch
>> myself, they may argue that doing such a complicated re-design after the
>> feature freeze shouldn't be allowed.
>
> What about something like this?
>
>  -George
>
> ---
>  [PATCH] hvmloader/pci: Try to avoid placing BARs in RMRRs
>
> Try to avoid placing PCI BARs over RMRRs:
>
> - If mmio_hole_size is not specified, and the existing MMIO range has
>   RMRRs in it, and there is space to expand the hole in lowmem without
>   moving more memory, then make the MMIO hole as large as possible.
>
> - When placing RMRRs, find the next RMRR higher than the current base
>   in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
>   the next one.
>
> This certainly won't work in all cases, but it should work in a
> significant number of cases.  Additionally, users should be able to
> work around problems by setting mmio_hole_size larger in the guest
> config.
>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> THIS WILL NOT COMPILE, as it needs check_overlap_all() to be implemented.
>
> It's just a proof-of-concept for discussion.
> ---
>  tools/firmware/hvmloader/pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 5ff87a7..dcb8cd0 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -38,6 +38,25 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  enum virtual_vga virtual_vga = VGA_none;
>  unsigned long igd_opregion_pgbase = 0;
>
> +/* Find the lowest RMRR higher than base */
> +int find_next_rmrr(uint32_t base)
> +{
> +    int next_rmrr=-1;
> +    uing64_t min_base = (1ull << 32);
> +
> +    for ( i = 0; i < memory_map.nr_map ; i++ )
> +    {
> +        if ( memory_map.map[i].type == E820_RESERVED
> +             && memory_map.map[i].addr > base
> +             && memory_map.map[i].addr < min_base)
> +        {
> +            next_rmrr = i;
> +            min_base = memory_map.map[i].addr;
> +        }
> +    }
> +    return next_rmrr;
> +}
> +
>  void pci_setup(void)
>  {
>      uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> @@ -299,6 +318,15 @@ void pci_setup(void)
>                      || (((pci_mem_start << 1) >> PAGE_SHIFT)
>                          >= hvm_info->low_mem_pgend)) )
>              pci_mem_start <<= 1;
> +
> +        /*
> +         * Try to accomodate RMRRs in our MMIO region on a best-effort basis.
> +         * If we have RMRRs in the range, then make pci_mem_start just after
> +         * hvm_info->low_mem_pgend.
> +         */
> +        if ( pci_mem_start > (hvm_info->low_mem_pgend << PAGE_SHIFT) &&
> +             check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) )
> +            pci_mem_start = (hvm_info->low_mem_pgend + 1 ) << PAGE_SHIFT);
>      }
>
>      if ( mmio_total > (pci_mem_end - pci_mem_start) )
> @@ -352,6 +380,8 @@ void pci_setup(void)
>      io_resource.base = 0xc000;
>      io_resource.max = 0x10000;
>
> +    next_rmrr = find_next_rmrr(pci_mem_start);
> +
>      /* Assign iomem and ioport resources in descending order of size. */
>      for ( i = 0; i < nr_bars; i++ )
>      {
> @@ -407,6 +437,18 @@ void pci_setup(void)
>          }
>
>          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> +
> +        /* If we're using mem_resource, check for RMRR conflicts */
> +        while ( resource == &mem_resource &&
> +                next_rmrr > 0 &&
> +                check_overlap(base, bar_sz,
> +                              memory_map.map[next_rmrr].addr,
> +                              memory_map.map[next_rmrr].size)) {
> +            base = memory_map.map[next_rmrr].addr +
> memory_map.map[next_rmrr].size;
> +            base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);

Sorry, this should obviously be

base = (base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);

I thought I'd changed it, but apparently I just skipped that step. :-)

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 16:08                         ` Chen, Tiejun
@ 2015-07-16 16:40                           ` George Dunlap
  2015-07-16 21:24                             ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-16 16:40 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

On 07/16/2015 05:08 PM, Chen, Tiejun wrote:
> On 2015/7/16 23:39, George Dunlap wrote:
>> On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>>>> What about this?
>>>>
>>>> Looks reasonable (but don't forget that I continue to be unconvinced
>>>> that the patch as a whole makes sense).
>>>
>>> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
>>> you're still concerning? Is it that case if guest OS force enabling
>>> these devices again? IMO, at this point there are two cases:
>>>
>>> #1. Without passing through a RMRR device
>>>
>>> Those emulated devices don't create 1:1 mapping so its safe, right?
>>>
>>> #2. With passing through a RMRR device
>>>
>>> This just probably cause these associated devices not to work well, but
>>> still don't bring any impact to other Domains, right? I mean this isn't
>>> going to worsen the preexisting situation.
>>>
>>> If I'm wrong please correct me.
>>
>> But I think the issue is, without doing *something* about MMIO
>> collisions, the feature as a whole is sort of pointless.  You can
>> carefully specify rdm="strategy=host,reserved=strict", but you might
> 
> I got what your mean. But there's no a good approach to bridge between
> xl and hvmloader to follow this policy. Right now, maybe just one thing
> could be tried like this,
> 
> Under hvmloader circumstance,
> 
> "strict" -> Still set RDM as E820_RESERVED
> "relaxed" -> Set RDM as a new internal E820 flag like E820_HAZARDOUS
> 
> Then in the case of MMIO collisions
> 
> E820_RESERVED -> BUG() -> Stop VM
> E820_HAZARDOUS -> our warning messages + disable devices
> 
> I think this can make sure we always take consistent policy in each
> involved cycle.

A better way to communicate between xl and hvmloader is to use xenstore,
as we do for allow_memory_reallocate.  But I have very little hope we
can hash out a suitable design for that by tomorrow.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 16:18                         ` George Dunlap
  2015-07-16 16:31                           ` George Dunlap
@ 2015-07-16 21:15                           ` Chen, Tiejun
  2015-07-17  9:26                             ` George Dunlap
  1 sibling, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 21:15 UTC (permalink / raw)
  To: George Dunlap, George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

>
>           base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> +
> +        /* If we're using mem_resource, check for RMRR conflicts */
> +        while ( resource == &mem_resource &&
> +                next_rmrr > 0 &&
> +                check_overlap(base, bar_sz,
> +                              memory_map.map[next_rmrr].addr,
> +                              memory_map.map[next_rmrr].size)) {
> +            base = memory_map.map[next_rmrr].addr +
> memory_map.map[next_rmrr].size;
> +            base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> +            next_rmrr=find_next_rmrr(base);
> +        }
> +
>           bar_data |= (uint32_t)base;
>           bar_data_upper = (uint32_t)(base >> 32);
>           base += bar_sz;
>

Actually this chunk of codes are really similar as what we did in my 
previous revisions from RFC ~ v3. It's just trying to skip and then 
allocate, right? As Jan pointed out, there are two key problems:

#1. All skipping action probably cause a result of no sufficient MMIO to 
allocate all devices as before.

#2. Another is that alignment issue. When the original "base" change to 
align to rdm_end, some spaces are wasted. Especially, these spaces could 
be allocated to other smaller bars.

This is one key reason why I had new revision started from v4 to address 
these two points :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 16:40                           ` George Dunlap
@ 2015-07-16 21:24                             ` Chen, Tiejun
  0 siblings, 0 replies; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-16 21:24 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Keir Fraser

Jan and George,

That implementation to this problem in v7 is really not accepted? Yes, 
that isn't a best solution to make our original mechanism very well, but 
in high level I just think that should be a correct solution fixing this 
problem. According to recent discussion seems we have not a efficient 
way which can be put into 4.6. So instead, could your guys help me make 
that better gradually to reach our current requirement as possible?

Thanks
Tiejun

On 2015/7/17 0:40, George Dunlap wrote:
> On 07/16/2015 05:08 PM, Chen, Tiejun wrote:
>> On 2015/7/16 23:39, George Dunlap wrote:
>>> On 07/16/2015 04:20 PM, Chen, Tiejun wrote:
>>>>>> What about this?
>>>>>
>>>>> Looks reasonable (but don't forget that I continue to be unconvinced
>>>>> that the patch as a whole makes sense).
>>>>
>>>> Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
>>>> you're still concerning? Is it that case if guest OS force enabling
>>>> these devices again? IMO, at this point there are two cases:
>>>>
>>>> #1. Without passing through a RMRR device
>>>>
>>>> Those emulated devices don't create 1:1 mapping so its safe, right?
>>>>
>>>> #2. With passing through a RMRR device
>>>>
>>>> This just probably cause these associated devices not to work well, but
>>>> still don't bring any impact to other Domains, right? I mean this isn't
>>>> going to worsen the preexisting situation.
>>>>
>>>> If I'm wrong please correct me.
>>>
>>> But I think the issue is, without doing *something* about MMIO
>>> collisions, the feature as a whole is sort of pointless.  You can
>>> carefully specify rdm="strategy=host,reserved=strict", but you might
>>
>> I got what your mean. But there's no a good approach to bridge between
>> xl and hvmloader to follow this policy. Right now, maybe just one thing
>> could be tried like this,
>>
>> Under hvmloader circumstance,
>>
>> "strict" -> Still set RDM as E820_RESERVED
>> "relaxed" -> Set RDM as a new internal E820 flag like E820_HAZARDOUS
>>
>> Then in the case of MMIO collisions
>>
>> E820_RESERVED -> BUG() -> Stop VM
>> E820_HAZARDOUS -> our warning messages + disable devices
>>
>> I think this can make sure we always take consistent policy in each
>> involved cycle.
>
> A better way to communicate between xl and hvmloader is to use xenstore,
> as we do for allow_memory_reallocate.  But I have very little hope we
> can hash out a suitable design for that by tomorrow.
>
>   -George
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [v9][PATCH 00/16] Fix RMRR
@ 2015-07-17  0:45 Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (15 more replies)
  0 siblings, 16 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel

v9:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Correct the patch head description:
  [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
    -> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END);
  Merge two if{} as one if{};

* Patch #6: hvmloader/pci: disable all pci devices conflicting with rdm
  A little improvement to code implementation but again, its still argued
  about this solution. Myself prefer to take a look at v7 if possible.

* Patch #7: hvmloader/e820: construct guest e820 table
  Refine that chunk of codes to check/modify highmem

* Patch #15: xen/vtd: prevent from assign the device with shared rmrr
  Correct one indentation issue

v8:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Force to pass "0"(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

* Patch #6: hvmloader/pci: disable all pci devices conflicting
  We have a big change to this patch:

  Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

  I know this is still argumented but I'd like to discuss this based on this
  revision and thanks for your time.

* Patch #7: hvmloader/e820: construct guest e820 table
  define low_mem_end as uint32_t;
  Correct those two wrong loops, memory_map.nr_map -> nr
  when we're trying to revise low/high memory e820 entries;
  Improve code comments and the patch head description;
  Add one check if highmem is just populated by hvmloader itself

* Patch #11: tools/libxl: detect and avoid conflicts with RDM
  Introduce pfn_to_paddr(x) -> ((uint64_t)x << XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

* Patch #13: libxl: construct e820 map with RDM information for HVM guest
  make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

* Patch #15:  xen/vtd: prevent from assign the device with shared rmrr
  Merge two if{} as one if{};
  Add to print RMRR range info when stop assign a group device

* Some minimal code style changes

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

* Just sync with the fallout of renaming parameters above.

Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. ) And
as we discussed I'd further improve something as next step after
this round of review.

v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
  and based on our discussion, we won't expose "ignore" in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now "0" means "strict" and "1" means "relaxed", and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename that field, is_64bar, inside struct
  bars with flag, and then extend to also indicate if this bar is already
  allocated.

* Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(),
  and then replace malloc() with libxl__malloc(), and finally cleanup this fallout.
  libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

* The original patch #13 is sent out separately since actually this is not related
  to RMRR.

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: detect and avoid conflicts with RDM
      tools: introduce a new parameter to set a predefined rdm boundary
      libxl: construct e820 map with RDM information for HVM guest
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr
      tools: parse to enable new rdm policy parameters

Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5                       | 103 ++++++++
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             | 131 +++++++++-
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |   2 +
 tools/firmware/hvmloader/pci.c              |  81 ++++++
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  12 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  45 +++-
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_arch.h                    |   7 +
 tools/libxl/libxl_arm.c                     |   8 +
 tools/libxl/libxl_create.c                  |  13 +-
 tools/libxl/libxl_dm.c                      | 273 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  13 +-
 tools/libxl/libxl_pci.c                     |  12 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  83 ++++++
 tools/libxl/libxlu_pci.c                    |  92 ++++++-
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  16 +-
 tools/python/xen/lowlevel/xc/xc.c           |  30 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  43 ++-
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |   3 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  15 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  82 ++++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |  13 +-
 xen/include/public/domctl.h                 |   3 +
 xen/include/public/memory.h                 |  37 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 46 files changed, 1376 insertions(+), 88 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Jan Beulich

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich <jbeulich@suse.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7 ~ v9:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 37 +++++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a730de5 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 48820ea..44ed23d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..ac7d3da 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,42 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /*
+     * IN/OUT
+     *
+     * Gets set to the required number of entries when too low,
+     * signaled by error code -ERANGE.
+     */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..d176e8b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 02/16] xen/vtd: create RMRR mapping
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Tim Deegan, Jan Beulich,
	Yang Zhang

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v9:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
    return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c               | 40 +++++++++++++++++++++++++++++++++++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h           | 13 +++++++++---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
                          p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                           unsigned long mfn, unsigned int page_order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
     gfn_lock(p2m, gfn, page_order);
-    p2m_remove_page(p2m, gfn, mfn, page_order);
+    rc = p2m_remove_page(p2m, gfn, mfn, page_order);
     gfn_unlock(p2m, gfn, page_order);
+    return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+
+    if ( !paging_mode_translate(p2m->domain) )
+        return 0;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                            p2m_mmio_direct, p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+    {
+        ret = -EBUSY;
+        printk(XENLOG_G_WARNING
+               "Cannot setup identity map d%d:%lx,"
+               " gfn already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+    }
+
+    gfn_unlock(p2m, gfn, 0);
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
             while ( base_pfn < end_pfn )
             {
-                if ( intel_iommu_unmap_page(d, base_pfn) )
+                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
                     ret = -ENXIO;
                 base_pfn++;
             }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..190a286 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -503,9 +503,9 @@ static inline int guest_physmap_add_page(struct domain *d,
 }
 
 /* Remove a page from a domain's p2m table */
-void guest_physmap_remove_page(struct domain *d,
-                               unsigned long gfn,
-                               unsigned long mfn, unsigned int page_order);
+int guest_physmap_remove_page(struct domain *d,
+                              unsigned long gfn,
+                              unsigned long mfn, unsigned int page_order);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -543,6 +543,13 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
+#define clear_identity_p2m_entry(d, gfn, page_order) \
+                        guest_physmap_remove_page(d, gfn, gfn, page_order)
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  6:48   ` Jan Beulich
  2015-07-20  1:12   ` Tian, Kevin
  2015-07-17  0:45 ` [v9][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Yang Zhang,
	Stefano Stabellini, Ian Campbell

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
v9:

* Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

v8:

* Force to pass "0"(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  "0" means "strict" and "1" means "relaxed".

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c                       |  7 ++++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       |  3 ++-
 xen/drivers/passthrough/pci.c               | 15 ++++++++----
 xen/drivers/passthrough/vtd/iommu.c         | 37 ++++++++++++++++++++++-------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  3 +++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, unsigned int flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         ret = 0;
     else
     {
-        ret = -EBUSY;
+        if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
+            ret = 0;
+        else
+            ret = -EBUSY;
         printk(XENLOG_G_WARNING
                "Cannot setup identity map d%d:%lx,"
                " gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    /* The flag field doesn't matter to DT device. */
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0);
 
     if ( rc )
         goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..c7bbf6e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
+        if ( flag & ~XEN_DOMCTL_DEV_RDM_RELAXED )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8415958..b5d658e 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1807,7 +1807,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1855,7 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            /*
+             * iommu_add_device() is only called for the hardware
+             * domain (see xen/drivers/passthrough/pci.c:pci_add_device()).
+             * Since RMRRs are always reserved in the e820 map for the hardware
+             * domain, there shouldn't be a conflict.
+             */
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr, 0);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1939,7 +1946,11 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        /*
+         * Any flag is nothing to clear these mappings but here
+         * its always safe and strict to set 0.
+         */
+        rmrr_identity_mapping(pdev->domain, 0, rmrr, 0);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2109,13 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        /*
+         * Here means we're add a device to the hardware domain.
+         * Since RMRRs are always reserved in the e820 map for the hardware
+         * domain, there shouldn't be a conflict. So its always safe and
+         * strict to set 0.
+         */
+        ret = rmrr_identity_mapping(d, 1, rmrr, 0);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2258,11 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                /*
+                 * Any RMRR flag is always ignored when remove a device,
+                 * but its always safe and strict to set 0.
+                 */
+                ret = rmrr_identity_mapping(source, 0, rmrr, 0);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2286,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2315,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 190a286..68da0a9 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, unsigned int flag);
 
 #define clear_identity_p2m_entry(d, gfn, page_order) \
                         guest_physmap_remove_page(d, gfn, gfn, page_order)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..bca25c9 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -478,6 +478,9 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 04/16] xen: enable XENMEM_memory_map in hvm
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
v5 ~ v9:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (3 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v9:

* Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
    -> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in
  the patch head description;
  Merge two if{} as one if{};

v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c      | 32 ++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 12 ++++++++++++
 5 files changed, 79 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..7a414ab 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,38 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc || !nr_entries )
+    {
+        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+        BUG();
+    }
+
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED &&
+             check_overlap(alloc_addr, alloc_size,
+                           memory_map.map[i].addr, memory_map.map[i].size) )
+        {
+            printf("Fail to setup memory map due to conflict");
+            printf(" on dynamic reserved memory range.\n");
+            BUG();
+        }
+    }
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
     uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,13 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[],
+                           unsigned int *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (4 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17 13:59   ` Jan Beulich
  2015-07-17  0:45 ` [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

When allocating mmio address for PCI bars, mmio may overlap with
reserved regions. Currently we just want to disable these associate
devices simply to avoid conflicts but we will reshape current mmio
allocation mechanism to fix this completely.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v9:

* A little improvement to code implementation but again, its still argued about
  this solution.

v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 81 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..15ed9b2 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,84 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/*
+ * We should check if all valid bars conflict with RDM.
+ *
+ * Here we just need to check mmio bars in the case of non-highmem
+ * since the hypervisor can make sure RDM doesn't involve highmem.
+ */
+static void disable_conflicting_devices(void)
+{
+    uint8_t is_64bar;
+    uint32_t devfn, bar_reg, cmd, bar_data;
+    uint16_t vendor_id, device_id;
+    unsigned int bar, i;
+    uint64_t bar_sz;
+    bool is_conflict = false;
+
+    for ( devfn = 0; devfn < 256; devfn++ )
+    {
+        vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
+        device_id = pci_readw(devfn, PCI_DEVICE_ID);
+        if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
+            continue;
+
+        /* Check all bars */
+        for ( bar = 0; bar < 7 && !is_conflict; bar++ )
+        {
+            bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
+            if ( bar == 6 )
+                bar_reg = PCI_ROM_ADDRESS;
+
+            bar_data = pci_readl(devfn, bar_reg);
+            bar_data &= PCI_BASE_ADDRESS_MEM_MASK;
+            if ( !bar_data )
+                continue;
+
+            if ( bar_reg != PCI_ROM_ADDRESS )
+                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
+                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
+                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
+                             PCI_BASE_ADDRESS_MEM_TYPE_64));
+
+            /* Until here we never conflict high memory. */
+            if ( is_64bar && pci_readl(devfn, bar_reg + 4) )
+                continue;
+
+            /* Just check mmio bars. */
+            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+                  PCI_BASE_ADDRESS_SPACE_IO) )
+                continue;
+
+            bar_sz = pci_readl(devfn, bar_reg);
+            bar_sz &= PCI_BASE_ADDRESS_MEM_MASK;
+
+            for ( i = 0; i < memory_map.nr_map && !is_conflict; i++ )
+            {
+                if ( memory_map.map[i].type == E820_RESERVED )
+                {
+                    uint64_t reserved_start, reserved_size;
+                    reserved_start = memory_map.map[i].addr;
+                    reserved_size = memory_map.map[i].size;
+                    if ( check_overlap(bar_data , bar_sz,
+                                   reserved_start, reserved_size) )
+                    {
+                        /* Now disable the memory or I/O mapping. */
+                        printf("pci dev %02x:%x bar %02x : 0x%08x : conflicts "
+                               "reserved resource so disable this device.!\n",
+                               devfn>>3, devfn&7, bar_reg, bar_data);
+                        cmd = pci_readw(devfn, PCI_COMMAND);
+                        pci_writew(devfn, PCI_COMMAND, ~cmd);
+                        /* So need to jump next device. */
+                        is_conflict = true;
+                    }
+                }
+            }
+        }
+        is_conflict = false;
+    }
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -462,6 +540,9 @@ void pci_setup(void)
         cmd |= PCI_COMMAND_IO;
         pci_writew(vga_devfn, PCI_COMMAND, cmd);
     }
+
+    /* If pci bars conflict with RDM we need to disable this pci device. */
+    disable_conflicting_devices();
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (5 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  7:40   ` Jan Beulich
  2015-07-17  0:45 ` [v9][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v9:

* Refine that chunk of codes to check/modify highmem

v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map -> nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 99 +++++++++++++++++++++++++++++++++++------
 1 file changed, 85 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 7a414ab..49d420a 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT;
+    uint64_t add_high_mem = 0;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
     {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
         nr++;
     }
 
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));
+
+    /*
+     * Its possible to relocate RAM to allocate sufficient MMIO previously
+     * so low_mem_pgend would be changed over there. And here memory_map[]
+     * records the original low/high memory, so if low_mem_end is less than
+     * the original we need to revise low/high memory range in e820.
+     */
+    for ( i = 0; i < nr; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+        }
+    }
+
+    /*
+     * And then we also need to adjust highmem.
+     */
+    if ( add_high_mem )
+    {
+        /* Modify the existing highmem region if it exists. */
+        for ( i = 0; i < nr; i++ )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr == ((uint64_t)1 << 32))
+            {
+                e820[i].size += add_high_mem;
+                break;
+            }
+        }
+
+        /* If there was no highmem region, just create one. */
+        if ( i == nr )
+        {
+            e820[nr].addr = ((uint64_t)1 << 32);
+            e820[nr].size = add_high_mem;
+            e820[nr].type = E820_RAM;
+            nr++;
+        }
+
+        /* A sanity check if high memory is broken. */
+        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
+    }
+
+    /* Finally we need to sort all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (6 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4 ~ v9:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (7 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, David Scott, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: David Scott <dave.scott@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v9:

* Nothing is changed.

v5:

* Fix the flag field as "0" to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  9 ++++++++-
 tools/libxl/libxl_pci.c             |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 ++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 30 ++++++++++++++++++++----------
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
     domctl.u.assign_device.u.dt.size = size;
+    /*
+     * DT doesn't own any RDM so actually DT has nothing to do
+     * for any flag and here just fix that as 0.
+     */
+    domctl.u.assign_device.flag = 0;
     set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
     rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..a4928c6 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,17 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        /* O means we take "strict" as our default policy. */
+        *flag = 0;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +630,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +663,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +696,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (8 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v9:

* Nothing is changed.

v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
* Don't expose "ignore" in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 81 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++
 tools/libxl/libxl_create.c   |  7 ++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  9 +++++
 tools/libxl/libxl_types.idl  | 18 ++++++++++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item B<rdm="RDM_RESERVATION_STRING">
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<strategy="STRING">
+
+Currently there is only one valid type:
+
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does exist.
+
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad800000 end_address afffffff
+
+In this conflict case,
+
+#1. If B<strategy> is set to "host", for example,
+
+rdm = "strategy=host,policy=strict" or rdm = "strategy=host,policy=relaxed"
+
+It means all conflicts will be handled according to the policy
+introduced by B<policy> as described below.
+
+#2. If B<strategy> is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=xxxxx' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by B<rdm_policy="STRING"> as described inside pci options.
+
+=item B<policy="STRING">
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+"strict" means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+"relaxed" allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden by rdm_policy option in PCI device configuration.
+
+=back
+
+=back
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -717,6 +790,14 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_policy="STRING">
+
+(HVM/x86 only) This is same as policy option inside the rdm option but
+just specific to a given device. Therefore the default is "relaxed" as
+same as policy option as well.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..88b2102 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "strategy=host, policy=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_policy=relaxed', '03:00.0,rdm_policy=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..f75d4f1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -384,6 +390,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        libxl__rdm_setdefault(gc, b_info);
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d52589e..d397143 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1154,6 +1154,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..1ebdce7 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -988,6 +988,12 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_policy != LIBXL_RDM_RESERVE_POLICY_RELAXED) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
@@ -1040,6 +1046,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if (pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        pci->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..47dd83a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+    (0, "ignore"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_policy = Enumeration("rdm_reserve_policy", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_POLICY_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("strategy",    libxl_rdm_reserve_strategy),
+    ("policy",      libxl_rdm_reserve_policy),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -467,6 +483,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        # See libxl_ms_vm_genid_generate()
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
+                                       ("rdm", libxl_rdm_reserve),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
@@ -542,6 +559,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
 libxl_device_dtdev = Struct("device_dtdev", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (9 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v9:

* Nothing is changed.

v8:

* Introduce pfn_to_paddr(x) -> ((uint64_t)x << XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c       | 273 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c      |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+                         uint32_t flag,
+                         uint16_t seg,
+                         uint8_t bus,
+                         uint8_t devfn,
+                         unsigned int *nr_entries,
+                         struct xen_reserved_device_memory **xrdm)
+{
+    int rc = 0, r;
+
+    /*
+     * We really can't presume how many entries we can get in advance.
+     */
+    *nr_entries = 0;
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      NULL, nr_entries);
+    assert(r <= 0);
+    /* "0" means we have no any rdm entry. */
+    if (!r) goto out;
+
+    if (errno != ENOBUFS) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *xrdm = libxl__malloc(gc,
+                          *nr_entries * sizeof(xen_reserved_device_memory_t));
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      *xrdm, nr_entries);
+    if (r)
+        rc = ERROR_FAIL;
+
+ out:
+    if (rc) {
+        *nr_entries = 0;
+        *xrdm = NULL;
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+    }
+    return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+#define pfn_to_paddr(x) ((uint64_t)x << XC_PAGE_SHIFT)
+static void
+set_rdm_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                uint64_t rdm_start, uint64_t rdm_size, int rdm_policy,
+                unsigned int nr_entries)
+{
+    assert(nr_entries);
+
+    d_config->num_rdms = nr_entries;
+    d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                            d_config->num_rdms * sizeof(libxl_device_rdm));
+
+    d_config->rdms[d_config->num_rdms - 1].start = rdm_start;
+    d_config->rdms[d_config->num_rdms - 1].size = rdm_size;
+    d_config->rdms[d_config->num_rdms - 1].policy = rdm_policy;
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ * Note when both policies are specified on a given region, the per-device
+ * policy should override the global policy.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict, rc;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint32_t strategy = d_config->b_info.u.hvm.rdm.strategy;
+    uint16_t seg;
+    uint8_t bus, devfn;
+    uint64_t rdm_start, rdm_size;
+    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+
+    /* Might not expose rdm. */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE &&
+        !d_config->num_pcidevs)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                      0, 0, 0, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        if (!nr_entries)
+            return 0;
+
+        assert(xrdm);
+
+        for (i = 0; i < nr_entries; i++)
+            set_rdm_entries(gc, d_config,
+                            pfn_to_paddr(xrdm[i].start_pfn),
+                            pfn_to_paddr(xrdm[i].nr_pages),
+                            d_config->b_info.u.hvm.rdm.policy,
+                            i+1);
+    } else {
+        d_config->num_rdms = 0;
+    }
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+        bool new = true;
+
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                      seg, bus, devfn, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        assert(xrdm);
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RDMs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RDM entry
+         *
+         * Different policies may be configured on the same RDM due to above
+         * two cases. But we don't allow to assign such a group devies right
+         * now so it doesn't come true in our case.
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start == pfn_to_paddr(xrdm[0].start_pfn))
+            {
+                /*
+                 * So the per-device policy always override the global policy
+                 * in this case.
+                 */
+                d_config->rdms[j].policy = d_config->pcidevs[i].rdm_policy;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            set_rdm_entries(gc, d_config,
+                            pfn_to_paddr(xrdm[0].start_pfn),
+                            pfn_to_paddr(xrdm[0].nr_pages),
+                            d_config->pcidevs[i].rdm_policy,
+                            d_config->num_rdms);
+        }
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RDM  in highmem area (>4G)
+     *  - RDM lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RDM entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RDM entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_end - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_end = rdm_start;
+        }
+    }
+
+    /* Sync highmem_end. */
+    args->highmem_end = highmem_end;
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_end,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 args->highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].policy = LIBXL_RDM_RESERVE_POLICY_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bdc0465..80fa17d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,13 +914,20 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    /*
+     * Currently we fix this as 2G to guarantee how to handle
+     * our rdm policy. But we'll provide a parameter to set
+     * this dynamically.
+     */
+    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -958,6 +965,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.highmem_end = highmem_end;
     args.mmio_start = mmio_start;
 
+    rc = libxl__domain_device_construct_rdm(gc, d_config,
+                                            rdm_mem_boundary,
+                                            &args);
+    if (rc) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (info->num_vnuma_nodes != 0) {
         int i;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d397143..b4d8419 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1057,7 +1057,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1565,6 +1565,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 47dd83a..a3ad8d1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -562,6 +562,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("policy", libxl_rdm_reserve_policy),
+    ])
+
 libxl_device_dtdev = Struct("device_dtdev", [
     ("path", string),
     ])
@@ -592,6 +598,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (10 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8 ~ v9:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable "rdm_mem_boundary_memkb" specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5       | 22 ++++++++++++++++++++++
 tools/libxl/libxl.h         |  6 ++++++
 tools/libxl/libxl_create.c  |  4 ++++
 tools/libxl/libxl_dom.c     |  8 +-------
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is available
 on the XenVGAPassthrough L<http://wiki.xen.org/wiki/XenVGAPassthrough>
 wiki page.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this
+        entry INVALID to indicate we shouldn't expose this entry to
+        hvmloader.
+
+Here the default is 2G.
+
 =item B<dtdev=[ "DTDEV_PATH", "DTDEV_PATH", ... ]>
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
     uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
 {
     if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
         b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+    if (b_info->u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->u.hvm.rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
     libxl_domain_build_info *const info = &d_config->b_info;
-    /*
-     * Currently we fix this as 2G to guarantee how to handle
-     * our rdm policy. But we'll provide a parameter to set
-     * this dynamically.
-     */
-    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mmio_start = mmio_start;
 
     rc = libxl__domain_device_construct_rdm(gc, d_config,
-                                            rdm_mem_boundary,
+                                            info->u.hvm.rdm_mem_boundary_memkb*1024,
                                             &args);
     if (rc) {
         LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
+                                       ("rdm_mem_boundary_memkb", MemKB),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..dfb50d6 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1519,6 +1519,9 @@ static void parse_config_data(const char *config_source,
                     exit(1);
             }
         }
+
+        if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+            b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (11 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v8 ~ v9:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable "rdm_mem_boundary_memkb" specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.


 tools/libxl/libxl_arch.h |  7 ++++
 tools/libxl/libxl_arm.c  |  8 +++++
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
     return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (libxl__arch_domain_construct_memmap(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
+    int rc = 0;
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    struct e820entry *e820 = NULL;
+    uint64_t highmem_size =
+                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+    /* Add all rdm entries. */
+    for (i = 0; i < d_config->num_rdms; i++)
+        if (d_config->rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+            e820_entries++;
+
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+        e820_entries++;
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+out:
+    return rc;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-basic-offset: 4
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 14/16] xen/vtd: enable USB device assignment
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (12 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v5 ~ v9:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (13 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  2015-07-17  0:45 ` [v9][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v9:

* Correct one indentation issue

v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..770e484 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: in the future we can introduce group device assignment
+     * interface to make sure devices sharing RMRR are assigned to the
+     * same domain together.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn &&
+             rmrr->scope.devices_cnt > 1 )
+        {
+            printk(XENLOG_G_ERR VTDPREFIX
+                   " cannot assign %04x:%02x:%02x.%u"
+                   " with shared RMRR at %"PRIx64" for Dom%d.\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                   rmrr->base_address, d->domain_id);
+            return -EPERM;
+        }
+    }
+
     ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
     if ( ret )
         return ret;
 
-    seg = pdev->seg;
-    bus = pdev->bus;
-
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [v9][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (14 preceding siblings ...)
  2015-07-17  0:45 ` [v9][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-07-17  0:45 ` Tiejun Chen
  15 siblings, 0 replies; 83+ messages in thread
From: Tiejun Chen @ 2015-07-17  0:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v9:

* Nothing is changed.

v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++++++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_STRATEGY      10
+#define STATE_RESERVE_POLICY    11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
-                }else{
+                } else if (!strcmp(optkey, "rdm_policy")) {
+                    if (!strcmp(tok, "strict")) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                    } else if (!strcmp(tok, "relaxed")) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " policy: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
+                } else {
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
                 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_STRATEGY;
+                *ptr = '\0';
+                if (strcmp(tok, "strategy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_STRATEGY:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_POLICY;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM strategy option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_POLICY:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "policy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property policy value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index dfb50d6..38d6c53 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1923,6 +1923,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->u.hvm.rdm.strategy = rdm.strategy;
+            b_info->u.hvm.rdm.policy = rdm.policy;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1937,6 +1945,11 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /*
+             * Like other pci option, the per-device policy always follows
+             * the global policy by default.
+             */
+            pcidev->rdm_policy = b_info->u.hvm.rdm.policy;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-17  0:45 ` [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-17  6:48   ` Jan Beulich
  2015-07-20  1:12   ` Tian, Kevin
  1 sibling, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-17  6:48 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>>> On 17.07.15 at 02:45, <tiejun.chen@intel.com> wrote:
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  0:45 ` [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-17  7:40   ` Jan Beulich
  2015-07-17  9:09     ` Chen, Tiejun
  2015-07-17  9:27     ` Chen, Tiejun
  0 siblings, 2 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-17  7:40 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 02:45, <tiejun.chen@intel.com> wrote:
> Now use the hypervisor-supplied memory map to build our final e820 table:
> * Add regions for BIOS ranges and other special mappings not in the
>   hypervisor map
> * Add in the hypervisor regions

... hypervisor supplied regions?

> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
> +    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT;
> +    uint64_t add_high_mem = 0;

Just like previously said for low_mem_end - why not uint32_t?

> @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
>  
> -
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Construct E820 table according to recorded memory map.
> +     *
> +     * The memory map created by toolstack may include,
> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. Reserved regions if they exist
> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
>      {
> -        e820[nr].addr = ((uint64_t)1 << 32);
> -        e820[nr].size =
> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -        e820[nr].type = E820_RAM;
> +        e820[nr] = memory_map.map[i];
>          nr++;
>      }
>  
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_end < (2u << 20));
> +
> +    /*
> +     * Its possible to relocate RAM to allocate sufficient MMIO previously

DYM "We may have relocated RAM ..."?

> +     * so low_mem_pgend would be changed over there. And here memory_map[]
> +     * records the original low/high memory, so if low_mem_end is less than
> +     * the original we need to revise low/high memory range in e820.
> +     */
> +    for ( i = 0; i < nr; i++ )
> +    {
> +        uint64_t end = e820[i].addr + e820[i].size;
> +        if ( e820[i].type == E820_RAM &&

Blank line between declarations and statements please.

> +             low_mem_end > e820[i].addr && low_mem_end < end )
> +        {
> +            add_high_mem = end - low_mem_end;
> +            e820[i].size = low_mem_end - e820[i].addr;
> +        }
> +    }

The way it's written I take it that you assume there to be exactly
one region that the adjustment needs to be done for. Iirc this is
correct with the current model, but why would you continue the
loop then afterwards? Placing a "break" in the if()'s body would
document the fact that only one such region should exist, and
would eliminate questions as to whether add_high_mem shouldn't
be updated (+=) instead of simply being assigned a new value.

And then of course there's the question of whether "nr" is really
the right upper loop bound here: Just prior to this you added
the hypervisor supplied entries - why would you need to iterate
over them here? I.e. I'd see this better be moved ahead of that
other code.

> +    /*
> +     * And then we also need to adjust highmem.
> +     */

I'm sure I gave this comment before: This is a single line comment, so
its style should be that of a single line comment.

> +    if ( add_high_mem )
> +    {
> +        /* Modify the existing highmem region if it exists. */
> +        for ( i = 0; i < nr; i++ )
> +        {
> +            if ( e820[i].type == E820_RAM &&
> +                 e820[i].addr == ((uint64_t)1 << 32))
> +            {
> +                e820[i].size += add_high_mem;
> +                break;
> +            }
> +        }
> +
> +        /* If there was no highmem region, just create one. */
> +        if ( i == nr )
> +        {
> +            e820[nr].addr = ((uint64_t)1 << 32);
> +            e820[nr].size = add_high_mem;
> +            e820[nr].type = E820_RAM;
> +            nr++;
> +        }
> +
> +        /* A sanity check if high memory is broken. */
> +        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
> +    }

Remind me again please - what prevents the highmem region from
colliding with hypervisor supplied entries?

Also, what if the resulting region exceeds the addressable range
(guest's view of CPUID[80000008].EAX[0:7])?

> +    /* Finally we need to sort all e820 entries. */
> +    for ( j = 0; j < nr-1; j++ )
> +    {
> +        for ( i = j+1; i < nr; i++ )
> +        {
> +            if ( e820[j].addr > e820[i].addr )
> +            {
> +                struct e820entry tmp;
> +                tmp = e820[j];

Please make this the initializer of tmp and add the once again missing
blank line.

Jan

> +                e820[j] = e820[i];
> +                e820[i] = tmp;
> +            }
> +        }
> +    }
> +
>      return nr;
>  }
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  7:40   ` Jan Beulich
@ 2015-07-17  9:09     ` Chen, Tiejun
  2015-07-17 10:50       ` Jan Beulich
  2015-07-17  9:27     ` Chen, Tiejun
  1 sibling, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17  9:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

> The way it's written I take it that you assume there to be exactly
> one region that the adjustment needs to be done for. Iirc this is
> correct with the current model, but why would you continue the
> loop then afterwards? Placing a "break" in the if()'s body would
> document the fact that only one such region should exist, and
> would eliminate questions as to whether add_high_mem shouldn't
> be updated (+=) instead of simply being assigned a new value.

Yes, "break" should be added here.

>
> And then of course there's the question of whether "nr" is really
> the right upper loop bound here: Just prior to this you added
> the hypervisor supplied entries - why would you need to iterate
> over them here? I.e. I'd see this better be moved ahead of that
> other code.
>

Sounds you mean I should sync low/high memory in memory_map.map[] 
beforehand and then fulfill e820 like this,

diff --git a/tools/firmware/hvmloader/e820.c 
b/tools/firmware/hvmloader/e820.c
index 7a414ab..b0aa48d 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
                       unsigned int lowmem_reserved_base,
                       unsigned int bios_image_base)
  {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    uint32_t add_high_mem = 0;
+    uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << 
PAGE_SHIFT;
+    uint64_t map_start, map_size, map_end;

      if ( !lowmem_reserved_base )
              lowmem_reserved_base = 0xA0000;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
      e820[nr].type = E820_RESERVED;
      nr++;

-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - 
e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
      /*
       * Explicitly reserve space for special pages.
       * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820,
          nr++;
      }

+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));

-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     *
+     * Note we just have one low memory entry and one high mmeory entry if
+     * exists.
+     *
+     * But we may have relocated RAM to allocate sufficient MMIO previously
+     * so low_mem_pgend would be changed over there. And here memory_map[]
+     * records the original low/high memory, so if low_mem_end is less than
+     * the original we need to revise low/high memory range firstly.
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
      {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - 
e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        map_start = memory_map.map[i].addr;
+        map_size = memory_map.map[i].size;
+        map_end = map_start + map_end;
+
+        /* If we need to adjust lowmem. */
+        if ( memory_map.map[i].type == E820_RAM &&
+             low_mem_end > map_start && low_mem_end < map_end )
+        {
+            add_high_mem = map_end - low_mem_end;
+            memory_map.map[i].size = low_mem_end - map_start;
+            break;
+        }
+    }
+
+    /* If we need to adjust highmem. */
+    if ( add_high_mem )
+    {
+        /* Modify the existing highmem region if it exists. */
+        for ( i = 0; i < memory_map.nr_map; i++ )
+        {
+            map_start = memory_map.map[i].addr;
+            map_size = memory_map.map[i].size;
+            map_end = map_start + map_end;
+
+            if ( memory_map.map[i].type == E820_RAM &&
+                 map_start == ((uint64_t)1 << 32))
+            {
+                memory_map.map[i].size += add_high_mem;
+                break;
+            }
+        }
+
+        /* If there was no highmem region, just create one. */
+        if ( i == memory_map.nr_map )
+        {
+            memory_map.map[i].addr = ((uint64_t)1 << 32);
+            memory_map.map[i].size = add_high_mem;
+            memory_map.map[i].type = E820_RAM;
+        }
+
+        /* A sanity check if high memory is broken. */
+        BUG_ON( high_mem_end !=
+                memory_map.map[i].addr + memory_map.map[i].size);
+    }
+
+    /* Now fulfill e820. */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        e820[nr] = memory_map.map[i];
          nr++;
      }

+    /* Finally we need to sort all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp = e820[j];
+
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
      return nr;
  }


Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-16 21:15                           ` Chen, Tiejun
@ 2015-07-17  9:26                             ` George Dunlap
  2015-07-17 10:55                               ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-17  9:26 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, George Dunlap, xen-devel, Jan Beulich, Keir Fraser

On Thu, Jul 16, 2015 at 10:15 PM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>>
>>           base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>> +
>> +        /* If we're using mem_resource, check for RMRR conflicts */
>> +        while ( resource == &mem_resource &&
>> +                next_rmrr > 0 &&
>> +                check_overlap(base, bar_sz,
>> +                              memory_map.map[next_rmrr].addr,
>> +                              memory_map.map[next_rmrr].size)) {
>> +            base = memory_map.map[next_rmrr].addr +
>> memory_map.map[next_rmrr].size;
>> +            base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz -
>> 1);
>> +            next_rmrr=find_next_rmrr(base);
>> +        }
>> +
>>           bar_data |= (uint32_t)base;
>>           bar_data_upper = (uint32_t)(base >> 32);
>>           base += bar_sz;
>>
>
> Actually this chunk of codes are really similar as what we did in my
> previous revisions from RFC ~ v3. It's just trying to skip and then
> allocate, right? As Jan pointed out, there are two key problems:
>
> #1. All skipping action probably cause a result of no sufficient MMIO to
> allocate all devices as before.
>
> #2. Another is that alignment issue. When the original "base" change to
> align to rdm_end, some spaces are wasted. Especially, these spaces could be
> allocated to other smaller bars.

Just to be pedantic: #2 is really just an extension of #1 -- i.e., it
doesn't matter if space is "wasted" if all the MMIO regions still fit;
the only reason #2 matters is that it makes #1 worse.

In any case, I know it's not perfect -- the point was to get something
that 1) was relatively simple to implement 2) worked out-of-the-box
for many cases, and 3) had a work-around which the user could use in
other cases.

Given that if we run out of MMIO space, all that happens is that some
devices will not really work, I think this solution is really no worse
than the "disable devices on conflict" solution; and it's better,
because you can actually work around it by increasing the MMIO hole
size.  But I'll leave it to  Jan and others to determine which (if
any) would be suitable to check in at this point.

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  7:40   ` Jan Beulich
  2015-07-17  9:09     ` Chen, Tiejun
@ 2015-07-17  9:27     ` Chen, Tiejun
  2015-07-17 10:53       ` Jan Beulich
  1 sibling, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17  9:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

> Remind me again please - what prevents the highmem region from
> colliding with hypervisor supplied entries?
>
> Also, what if the resulting region exceeds the addressable range
> (guest's view of CPUID[80000008].EAX[0:7])?

Any idea to this? I think this issue also exists previously.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  9:09     ` Chen, Tiejun
@ 2015-07-17 10:50       ` Jan Beulich
  2015-07-17 15:22         ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 10:50 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 11:09, <tiejun.chen@intel.com> wrote:
>> And then of course there's the question of whether "nr" is really
>> the right upper loop bound here: Just prior to this you added
>> the hypervisor supplied entries - why would you need to iterate
>> over them here? I.e. I'd see this better be moved ahead of that
>> other code.
>>
> 
> Sounds you mean I should sync low/high memory in memory_map.map[] 
> beforehand and then fulfill e820 like this,

Why would you want/need to sync into memory_map.map[]?
That's certainly not what I suggested.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17  9:27     ` Chen, Tiejun
@ 2015-07-17 10:53       ` Jan Beulich
  0 siblings, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 10:53 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 11:27, <tiejun.chen@intel.com> wrote:
>>  Remind me again please - what prevents the highmem region from
>> colliding with hypervisor supplied entries?
>>
>> Also, what if the resulting region exceeds the addressable range
>> (guest's view of CPUID[80000008].EAX[0:7])?
> 
> Any idea to this? I think this issue also exists previously.

Oh, right, so I guess leave this as is for now.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-17  9:26                             ` George Dunlap
@ 2015-07-17 10:55                               ` Jan Beulich
  0 siblings, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 10:55 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, George Dunlap, xen-devel, Tiejun Chen, Keir Fraser

>>> On 17.07.15 at 11:26, <George.Dunlap@eu.citrix.com> wrote:
> Given that if we run out of MMIO space, all that happens is that some
> devices will not really work, I think this solution is really no worse
> than the "disable devices on conflict" solution; and it's better,
> because you can actually work around it by increasing the MMIO hole
> size.

I agree.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-17  0:45 ` [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
@ 2015-07-17 13:59   ` Jan Beulich
  2015-07-17 14:24     ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 13:59 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 02:45, <tiejun.chen@intel.com> wrote:
> When allocating mmio address for PCI bars, mmio may overlap with
> reserved regions. Currently we just want to disable these associate
> devices simply to avoid conflicts but we will reshape current mmio
> allocation mechanism to fix this completely.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v9:
> 
> * A little improvement to code implementation but again, its still argued about
>   this solution.

And as said in reply to George's reply to v8 - the alternative he
proposed is still better than this one, and would therefore have
better chances of me agreeing to take what is there instead of
pushing for a proper solution.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
  2015-07-17 13:59   ` Jan Beulich
@ 2015-07-17 14:24     ` Chen, Tiejun
  0 siblings, 0 replies; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17 14:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> ---
>> v9:
>>
>> * A little improvement to code implementation but again, its still argued about
>>    this solution.
>
> And as said in reply to George's reply to v8 - the alternative he
> proposed is still better than this one, and would therefore have
> better chances of me agreeing to take what is there instead of
> pushing for a proper solution.
>

Looks I just need to pick up George' patch as our solution at least to 
eliminate all arguments in current cycle.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 10:50       ` Jan Beulich
@ 2015-07-17 15:22         ` Chen, Tiejun
  2015-07-17 15:31           ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17 15:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/17 18:50, Jan Beulich wrote:
>>>> On 17.07.15 at 11:09, <tiejun.chen@intel.com> wrote:
>>> And then of course there's the question of whether "nr" is really
>>> the right upper loop bound here: Just prior to this you added
>>> the hypervisor supplied entries - why would you need to iterate
>>> over them here? I.e. I'd see this better be moved ahead of that
>>> other code.
>>>
>>
>> Sounds you mean I should sync low/high memory in memory_map.map[]
>> beforehand and then fulfill e820 like this,
>
> Why would you want/need to sync into memory_map.map[]?

But actually I just felt this make our process clear.

> That's certainly not what I suggested.
>

Do you mean I should check low/high mem before we add the hypervisor 
supplied entries like this?

+    for ( i = nr-1; i > memory_map.nr_map; i-- )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+            break;
+        }
+    }
+
+    /* And then we also need to adjust highmem. */
+    if ( add_high_mem )
+    {
+        /* Modify the existing highmem region if it exists. */
+        for ( i = nr-1 ; i > memory_map.nr_map; i-- )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr == ((uint64_t)1 << 32))
+            {
+                e820[i].size += add_high_mem;
+                break;
+            }
+        }
+
+        /* If there was no highmem region, just create one. */
+        if ( i == memory_map.nr_map )
+        {
+            e820[nr].addr = ((uint64_t)1 << 32);
+            e820[nr].size = add_high_mem;
+            e820[nr].type = E820_RAM;
+            i = nr;
+            nr++;
+        }
+
+        /* A sanity check if high memory is broken. */
+        BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
+    }

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 15:22         ` Chen, Tiejun
@ 2015-07-17 15:31           ` Jan Beulich
  2015-07-17 15:54             ` Chen, Tiejun
  0 siblings, 1 reply; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 15:31 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 17:22, <tiejun.chen@intel.com> wrote:
> Do you mean I should check low/high mem before we add the hypervisor 
> supplied entries

Yes.

> like this?

Not exactly:

> +    for ( i = nr-1; i > memory_map.nr_map; i-- )

Before you add memory_map.nr_map, you should be able to iterate
from 0 to (not inclusive) nr. At least as far as I recall the original
patch.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 15:31           ` Jan Beulich
@ 2015-07-17 15:54             ` Chen, Tiejun
  2015-07-17 16:06               ` Jan Beulich
  0 siblings, 1 reply; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17 15:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> +    for ( i = nr-1; i > memory_map.nr_map; i-- )
>
> Before you add memory_map.nr_map, you should be able to iterate
> from 0 to (not inclusive) nr. At least as far as I recall the original
> patch.
>

Sorry, I really don't understand what you want.

Before we add memory_map.nr_map, e820[0, nr) don't include low/high 
memory, right? So sounds you want me to

for ( i = 0 i < memory_map.nr_map; i++ )
{
     if we need to adjust low memory, we just set final low e820 entry;
     if we need to adjust high memory, we just set final high e820 entry;
}

Right? But its impossible to do this since we can't assume 
memory_map.map[low memory] is always prior to memory_map.map[high memory].

If I still follow your way, please don't mind to show a pseudocode help 
me understand what you want.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 15:54             ` Chen, Tiejun
@ 2015-07-17 16:06               ` Jan Beulich
  2015-07-17 16:10                 ` Chen, Tiejun
  2015-07-18 12:35                 ` George Dunlap
  0 siblings, 2 replies; 83+ messages in thread
From: Jan Beulich @ 2015-07-17 16:06 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 17.07.15 at 17:54, <tiejun.chen@intel.com> wrote:
>> > +    for ( i = nr-1; i > memory_map.nr_map; i-- )
>>
>> Before you add memory_map.nr_map, you should be able to iterate
>> from 0 to (not inclusive) nr. At least as far as I recall the original
>> patch.
>>
> 
> Sorry, I really don't understand what you want.
> 
> Before we add memory_map.nr_map, e820[0, nr) don't include low/high 
> memory, right?

Why? memory_map is representing the reserved areas only, isn't it?
If that's not the case, then of course everything is fine.

Jan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 16:06               ` Jan Beulich
@ 2015-07-17 16:10                 ` Chen, Tiejun
  2015-07-18 12:35                 ` George Dunlap
  1 sibling, 0 replies; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-17 16:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser



On 2015/7/18 0:06, Jan Beulich wrote:
>>>> On 17.07.15 at 17:54, <tiejun.chen@intel.com> wrote:
>>>> +    for ( i = nr-1; i > memory_map.nr_map; i-- )
>>>
>>> Before you add memory_map.nr_map, you should be able to iterate
>>> from 0 to (not inclusive) nr. At least as far as I recall the original
>>> patch.
>>>
>>
>> Sorry, I really don't understand what you want.
>>
>> Before we add memory_map.nr_map, e820[0, nr) don't include low/high
>> memory, right?
>
> Why? memory_map is representing the reserved areas only, isn't it?

+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists

Am I wrong with your expectation?

Thanks
Tiejun

> If that's not the case, then of course everything is fine.
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-17 16:06               ` Jan Beulich
  2015-07-17 16:10                 ` Chen, Tiejun
@ 2015-07-18 12:35                 ` George Dunlap
  2015-07-20  6:19                   ` Chen, Tiejun
  1 sibling, 1 reply; 83+ messages in thread
From: George Dunlap @ 2015-07-18 12:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On Fri, Jul 17, 2015 at 5:06 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 17.07.15 at 17:54, <tiejun.chen@intel.com> wrote:
>>> > +    for ( i = nr-1; i > memory_map.nr_map; i-- )
>>>
>>> Before you add memory_map.nr_map, you should be able to iterate
>>> from 0 to (not inclusive) nr. At least as far as I recall the original
>>> patch.
>>>
>>
>> Sorry, I really don't understand what you want.
>>
>> Before we add memory_map.nr_map, e820[0, nr) don't include low/high
>> memory, right?
>
> Why? memory_map is representing the reserved areas only, isn't it?
> If that's not the case, then of course everything is fine.

I'm pretty sure the memory map we get here is an extension of the
original PV-only get_e820 hypercall, which *does* include both the
lowmem and highmem regions.

In any case, it's pretty clear from the patched code that Tiejun is
removing the old code which created the lowmem and highmem regions and
is not replacing it.  Where do you think the highmem region he's
looking for was coming from?

 -George

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-17  0:45 ` [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
  2015-07-17  6:48   ` Jan Beulich
@ 2015-07-20  1:12   ` Tian, Kevin
  1 sibling, 0 replies; 83+ messages in thread
From: Tian, Kevin @ 2015-07-20  1:12 UTC (permalink / raw)
  To: Chen, Tiejun, xen-devel
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Zhang, Yang Z,
	Stefano Stabellini, Ian Campbell

> From: Chen, Tiejun
> Sent: Friday, July 17, 2015 8:45 AM
> 
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.
> 
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

for vtd part, Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-18 12:35                 ` George Dunlap
@ 2015-07-20  6:19                   ` Chen, Tiejun
  0 siblings, 0 replies; 83+ messages in thread
From: Chen, Tiejun @ 2015-07-20  6:19 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>>> Before you add memory_map.nr_map, you should be able to iterate
>>>> from 0 to (not inclusive) nr. At least as far as I recall the original
>>>> patch.
>>>>
>>>
>>> Sorry, I really don't understand what you want.
>>>
>>> Before we add memory_map.nr_map, e820[0, nr) don't include low/high
>>> memory, right?
>>
>> Why? memory_map is representing the reserved areas only, isn't it?
>> If that's not the case, then of course everything is fine.
>
> I'm pretty sure the memory map we get here is an extension of the
> original PV-only get_e820 hypercall, which *does* include both the
> lowmem and highmem regions.
>
> In any case, it's pretty clear from the patched code that Tiejun is
> removing the old code which created the lowmem and highmem regions and
> is not replacing it.  Where do you think the highmem region he's
> looking for was coming from?
>

On second thoughts, I prefer to check/sync memory_map.map[] before copy 
them into e820 since ultimately this can make sure hvm_info, 
memory_map.map[] and e820 are on the same page.

Anyway, I would send out v10 to address others so please further post 
your comments over there.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest [and 1 more messages]
  2015-07-16  6:52     ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
@ 2015-07-22 13:55       ` Ian Jackson
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Jackson @ 2015-07-22 13:55 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest"):
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist, and hvmloader would need this info
> later.
> 
> Note this guest e820 table would be same as before if the
> platform has no any RDM or we disable RDM (by default).
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v8:
> 
> * make that core construction function as arch-specific to make sure
>   we don't break ARM at this point.

But:

Tiejun Chen writes ("[v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest"):
...
> ---
> v8 ~ v9:
> 
> * Nothing is changed.


In fact the message for v8 is correct.  The message for v9 and later
is wrong.

Posting patches with "Nothing is changed", where substantial changes
have been made, is poor.


Also, you should have removed Wei's ack.  I will do that.

Ian.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2015-07-22 13:55 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17  0:45 [v9][PATCH 00/16] Fix RMRR Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-07-17  6:48   ` Jan Beulich
2015-07-20  1:12   ` Tian, Kevin
2015-07-17  0:45 ` [v9][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
2015-07-17 13:59   ` Jan Beulich
2015-07-17 14:24     ` Chen, Tiejun
2015-07-17  0:45 ` [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-07-17  7:40   ` Jan Beulich
2015-07-17  9:09     ` Chen, Tiejun
2015-07-17 10:50       ` Jan Beulich
2015-07-17 15:22         ` Chen, Tiejun
2015-07-17 15:31           ` Jan Beulich
2015-07-17 15:54             ` Chen, Tiejun
2015-07-17 16:06               ` Jan Beulich
2015-07-17 16:10                 ` Chen, Tiejun
2015-07-18 12:35                 ` George Dunlap
2015-07-20  6:19                   ` Chen, Tiejun
2015-07-17  9:27     ` Chen, Tiejun
2015-07-17 10:53       ` Jan Beulich
2015-07-17  0:45 ` [v9][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
2015-07-16  6:52   ` [v8][PATCH 00/16] Fix RMRR Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-07-16  7:40       ` Jan Beulich
2015-07-16  7:48         ` Chen, Tiejun
2015-07-16  7:58           ` Jan Beulich
2015-07-16 11:09       ` George Dunlap
2015-07-16  6:52     ` [v8][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-07-16  9:18       ` Jan Beulich
2015-07-16 11:15       ` George Dunlap
2015-07-16  6:52     ` [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm Tiejun Chen
2015-07-16 11:32       ` George Dunlap
2015-07-16 11:52         ` Chen, Tiejun
2015-07-16 13:02           ` George Dunlap
2015-07-16 13:21             ` Chen, Tiejun
2015-07-16 13:32               ` Jan Beulich
2015-07-16 13:48                 ` Chen, Tiejun
2015-07-16 14:54                   ` Jan Beulich
2015-07-16 15:20                     ` Chen, Tiejun
2015-07-16 15:39                       ` George Dunlap
2015-07-16 16:08                         ` Chen, Tiejun
2015-07-16 16:40                           ` George Dunlap
2015-07-16 21:24                             ` Chen, Tiejun
2015-07-16 16:18                         ` George Dunlap
2015-07-16 16:31                           ` George Dunlap
2015-07-16 21:15                           ` Chen, Tiejun
2015-07-17  9:26                             ` George Dunlap
2015-07-17 10:55                               ` Jan Beulich
2015-07-16  6:52     ` [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-07-16 11:47       ` George Dunlap
2015-07-16 13:12         ` Chen, Tiejun
2015-07-16 14:29           ` George Dunlap
2015-07-16 15:04             ` Chen, Tiejun
2015-07-16 15:16               ` George Dunlap
2015-07-16 15:29                 ` Chen, Tiejun
2015-07-16 15:33                   ` George Dunlap
2015-07-16 15:42                     ` Chen, Tiejun
2015-07-16  6:52     ` [v8][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
2015-07-16  6:52     ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
2015-07-22 13:55       ` [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest [and 1 more messages] Ian Jackson
2015-07-16  6:53     ` [v8][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
2015-07-16  6:53     ` [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
2015-07-16  7:42       ` Jan Beulich
2015-07-16  6:53     ` [v8][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
2015-07-17  0:45 ` [v9][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).