xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [v7][PATCH 00/16] Fix RMRR
@ 2015-07-09  5:33 Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (16 more replies)
  0 siblings, 17 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

* Just sync with the fallout of renaming parameters above.

Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. ) And
as we discussed I'd further improve something as next step after
this round of review.

v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
  and based on our discussion, we won't expose "ignore" in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now "0" means "strict" and "1" means "relaxed", and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename that field, is_64bar, inside struct
  bars with flag, and then extend to also indicate if this bar is already
  allocated.

* Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(),
  and then replace malloc() with libxl__malloc(), and finally cleanup this fallout.
  libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

* The original patch #13 is sent out separately since actually this is not related
  to RMRR.

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: detect and avoid conflicts with RDM
      tools: introduce a new parameter to set a predefined rdm boundary
      libxl: construct e820 map with RDM information for HVM guest
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr
      tools: parse to enable new rdm policy parameters

Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5                       | 103 ++++++++
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             | 115 +++++++--
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |   2 +
 tools/firmware/hvmloader/pci.c              | 194 +++++++++++---
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  12 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  45 +++-
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_create.c                  |  13 +-
 tools/libxl/libxl_dm.c                      | 264 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  37 ++-
 tools/libxl/libxl_pci.c                     |  12 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  83 ++++++
 tools/libxl/libxlu_pci.c                    |  90 +++++++
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  16 +-
 tools/python/xen/lowlevel/xc/xc.c           |  30 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  43 +++-
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |   3 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  15 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  87 +++++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |  13 +-
 xen/include/public/domctl.h                 |   3 +
 xen/include/public/memory.h                 |  37 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 44 files changed, 1447 insertions(+), 117 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Jan Beulich

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich <jbeulich@suse.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 37 +++++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a730de5 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 48820ea..44ed23d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..ac7d3da 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,42 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /*
+     * IN/OUT
+     *
+     * Gets set to the required number of entries when too low,
+     * signaled by error code -ERANGE.
+     */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..d176e8b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 02/16] xen/vtd: create RMRR mapping
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Tim Deegan, Jan Beulich,
	Yang Zhang

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v7:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
    return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c               | 40 +++++++++++++++++++++++++++++++++++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h           | 13 +++++++++---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
                          p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                           unsigned long mfn, unsigned int page_order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
     gfn_lock(p2m, gfn, page_order);
-    p2m_remove_page(p2m, gfn, mfn, page_order);
+    rc = p2m_remove_page(p2m, gfn, mfn, page_order);
     gfn_unlock(p2m, gfn, page_order);
+    return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+
+    if ( !paging_mode_translate(p2m->domain) )
+        return 0;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                            p2m_mmio_direct, p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+    {
+        ret = -EBUSY;
+        printk(XENLOG_G_WARNING
+               "Cannot setup identity map d%d:%lx,"
+               " gfn already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+    }
+
+    gfn_unlock(p2m, gfn, 0);
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
             while ( base_pfn < end_pfn )
             {
-                if ( intel_iommu_unmap_page(d, base_pfn) )
+                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
                     ret = -ENXIO;
                 base_pfn++;
             }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..190a286 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -503,9 +503,9 @@ static inline int guest_physmap_add_page(struct domain *d,
 }
 
 /* Remove a page from a domain's p2m table */
-void guest_physmap_remove_page(struct domain *d,
-                               unsigned long gfn,
-                               unsigned long mfn, unsigned int page_order);
+int guest_physmap_remove_page(struct domain *d,
+                              unsigned long gfn,
+                              unsigned long mfn, unsigned int page_order);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -543,6 +543,13 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
+#define clear_identity_p2m_entry(d, gfn, page_order) \
+                        guest_physmap_remove_page(d, gfn, gfn, page_order)
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-10 13:26   ` George Dunlap
  2015-07-09  5:33 ` [v7][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Yang Zhang,
	Stefano Stabellini, Ian Campbell

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  "0" means "strict" and "1" means "relaxed".

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c                       |  7 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       |  3 ++-
 xen/drivers/passthrough/pci.c               | 15 ++++++++---
 xen/drivers/passthrough/vtd/iommu.c         | 40 +++++++++++++++++++++++------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  3 +++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 58 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, unsigned int flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         ret = 0;
     else
     {
-        ret = -EBUSY;
+        if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
+            ret = 0;
+        else
+            ret = -EBUSY;
         printk(XENLOG_G_WARNING
                "Cannot setup identity map d%d:%lx,"
                " gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    /* The flag field doesn't matter to DT device. */
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0);
 
     if ( rc )
         goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..6e23fc6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
+        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8415958..56f5911 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1807,7 +1807,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1855,7 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1898,7 +1899,14 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            /*
+             * Here means we're add a device to the hardware domain
+             * so actually RMRR is always reserved on e820 so either
+             * of flag is fine for hardware domain and here we'd like
+             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
+             */
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
+                                        XEN_DOMCTL_DEV_RDM_RELAXED);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1939,7 +1947,12 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        /*
+         * Any flag is nothing to clear these mappings so here
+         * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
+         */
+        rmrr_identity_mapping(pdev->domain, 0, rmrr,
+                              XEN_DOMCTL_DEV_RDM_RELAXED);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2111,13 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        /*
+         * Here means we're add a device to the hardware domain
+         * so actually RMRR is always reserved on e820 so either
+         * of flag is fine for hardware domain and here we'd like
+         * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
+         */
+        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2260,12 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                /*
+                 * Any RMRR flag is always ignored when remove a device,
+                 * so just pass XEN_DOMCTL_DEV_RDM_RELAXED.
+                 */
+                ret = rmrr_identity_mapping(source, 0, rmrr,
+                                            XEN_DOMCTL_DEV_RDM_RELAXED);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2289,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2318,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 190a286..68da0a9 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, unsigned int flag);
 
 #define clear_identity_p2m_entry(d, gfn, page_order) \
                         guest_physmap_remove_page(d, gfn, gfn, page_order)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..bca25c9 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -478,6 +478,9 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 04/16] xen: enable XENMEM_memory_map in hvm
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-09  5:33 ` [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
v5 ~ v7:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (3 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-10 13:49   ` George Dunlap
  2015-07-09  5:33 ` [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges Tiejun Chen
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c      | 35 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 12 ++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc || !nr_entries )
+    {
+        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+        BUG();
+    }
+
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED )
+        {
+            if ( check_overlap(alloc_addr, alloc_size,
+                               memory_map.map[i].addr,
+                               memory_map.map[i].size) )
+            {
+                printf("Fail to setup memory map due to conflict");
+                printf(" on dynamic reserved memory range.\n");
+                BUG();
+            }
+        }
+    }
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
     uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,13 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[],
+                           unsigned int *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (4 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-13 13:12   ` Jan Beulich
  2015-07-09  5:33 ` [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+    struct xen_add_to_physmap xatp;
+    unsigned int nr_pages = min_t(
+        unsigned int,
+        hvm_info->low_mem_pgend - (cur_pci_mem_start >> PAGE_SHIFT),
+        (1u << 16) - 1);
+    if ( hvm_info->high_mem_pgend == 0 )
+        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
+    hvm_info->low_mem_pgend -= nr_pages;
+    printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
+           " for lowmem MMIO hole\n",
+           nr_pages,
+           PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
+           PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
+    xatp.domid = DOMID_SELF;
+    xatp.space = XENMAPSPACE_gmfn_range;
+    xatp.idx   = hvm_info->low_mem_pgend;
+    xatp.gpfn  = hvm_info->high_mem_pgend;
+    xatp.size  = nr_pages;
+    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+        BUG();
+    hvm_info->high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
         uint64_t base, max;
-    } *resource, mem_resource, high_mem_resource, io_resource;
+    } *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource;
 
     /* Create a list of device BARs in descending order of size. */
     struct bars {
-        uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT        0x1
+#define PCI_BAR_IS_ALLOCATED    0x2
+        uint32_t flag;
         uint32_t devfn;
         uint32_t bar_reg;
         uint64_t bar_sz;
     } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
-    uint64_t mmio_hole_size = 0;
+    unsigned int i, j, n, nr_bars = 0;
+    uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+    bool bar32_allocating = 0;
+    uint64_t mmio32_unallocated_total = 0;
+    unsigned long cur_pci_mem_start = 0;
 
     const char *s;
     /*
@@ -222,7 +252,7 @@ void pci_setup(void)
             if ( i != nr_bars )
                 memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
 
-            bars[i].is_64bar = is_64bar;
+            bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0;
             bars[i].devfn   = devfn;
             bars[i].bar_reg = bar_reg;
             bars[i].bar_sz  = bar_sz;
@@ -309,29 +339,31 @@ void pci_setup(void)
     }
 
     /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+    cur_pci_mem_start = pci_mem_start;
     while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+        relocate_ram_for_pci_memory(cur_pci_mem_start);
+
+    /*
+     * Check if reserved device memory conflicts current pci memory.
+     * If yes, we need to first allocate bar32 since reserved devices
+     * always occupy low memory, and also enable relocating some BARs
+     * to 64bit as possible.
+     */
+    for ( i = 0; i < memory_map.nr_map ; i++ )
     {
-        struct xen_add_to_physmap xatp;
-        unsigned int nr_pages = min_t(
-            unsigned int,
-            hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
-            (1u << 16) - 1);
-        if ( hvm_info->high_mem_pgend == 0 )
-            hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
-        hvm_info->low_mem_pgend -= nr_pages;
-        printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
-               " for lowmem MMIO hole\n",
-               nr_pages,
-               PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
-               PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
-        xatp.domid = DOMID_SELF;
-        xatp.space = XENMAPSPACE_gmfn_range;
-        xatp.idx   = hvm_info->low_mem_pgend;
-        xatp.gpfn  = hvm_info->high_mem_pgend;
-        xatp.size  = nr_pages;
-        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
-            BUG();
-        hvm_info->high_mem_pgend += nr_pages;
+        reserved_start = memory_map.map[i].addr;
+        reserved_size = memory_map.map[i].size;
+        reserved_end = reserved_start + reserved_size;
+        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+                           reserved_start, reserved_size) )
+        {
+            printf("Reserved device memory conflicts current PCI memory,"
+                   " so first to allocate 32-bit BAR and trying to"
+                   " relocating some BARs to 64-bit\n");
+            bar32_allocating = 1;
+            if ( !bar64_relocate )
+                bar64_relocate = 1;
+        }
     }
 
     high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
@@ -352,6 +384,7 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
+ further_allocate:
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -359,6 +392,17 @@ void pci_setup(void)
         bar_reg = bars[i].bar_reg;
         bar_sz  = bars[i].bar_sz;
 
+        /* Check if this bar is allocated. */
+        if ( bars[i].flag & PCI_BAR_IS_ALLOCATED )
+            continue;
+
+        /*
+         * This means we'd like to first allocate 32bit bar to make sure
+         * all 32bit bars can be allocated as possible.
+         */
+        if ( (bars[i].flag & PCI_BAR_IS_64BIT) && bar32_allocating )
+            continue;
+
         /*
          * Relocate to high memory if the total amount of MMIO needed
          * is more than the low MMIO available.  Because devices are
@@ -377,7 +421,7 @@ void pci_setup(void)
          *   the code here assumes it to be.)
          * Should either of those two conditions change, this code will break.
          */
-        using_64bar = bars[i].is_64bar && bar64_relocate
+        using_64bar = (bars[i].flag & PCI_BAR_IS_64BIT) && bar64_relocate
             && (mmio_total > (mem_resource.max - mem_resource.base));
         bar_data = pci_readl(devfn, bar_reg);
 
@@ -395,7 +439,14 @@ void pci_setup(void)
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             } 
             else {
-                resource = &mem_resource;
+                /*
+                 * This menas we're trying to use that expanded
+                 * memory to reallocate 32bars.
+                 */
+                if ( mmio32_unallocated_total )
+                    resource = &exp_mem_resource;
+                else
+                    resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
             mmio_total -= bar_sz;
@@ -406,9 +457,44 @@ void pci_setup(void)
             bar_data &= ~PCI_BASE_ADDRESS_IO_MASK;
         }
 
-        base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
+        base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * We should skip all reserved device memory, but we also need
+         * to check if other smaller bars can be allocated if a mmio hole
+         * exists between resource->base and reserved device memory.
+         */
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_start = memory_map.map[i].addr;
+                reserved_size = memory_map.map[i].size;
+                reserved_end = reserved_start + reserved_size;
+                if ( check_overlap(base, bar_sz,
+                                   reserved_start, reserved_size) )
+                {
+                    /*
+                     * If a hole exists between base and reserved device
+                     * memory, lets go out simply to try allocate for next
+                     * bar since all bars are in descending order of size.
+                     */
+                    if ( resource->base < reserved_start )
+                        continue;
+                    /*
+                     * If not, we need to move resource->base to
+                     * reserved_end just to reallocate this bar.
+                     */
+                    else
+                    {
+                        resource->base = reserved_end;
+                        goto reallocate_bar;
+                    }
+                }
+            }
+        }
         base += bar_sz;
 
         if ( (base < resource->base) || (base > resource->max) )
@@ -428,7 +514,7 @@ void pci_setup(void)
                devfn>>3, devfn&7, bar_reg,
                PRIllx_arg(bar_sz),
                bar_data_upper, bar_data);
-			
+        bars[i].flag |= PCI_BAR_IS_ALLOCATED;
 
         /* Now enable the memory or I/O mapping. */
         cmd = pci_readw(devfn, PCI_COMMAND);
@@ -439,6 +525,54 @@ void pci_setup(void)
         else
             cmd |= PCI_COMMAND_IO;
         pci_writew(devfn, PCI_COMMAND, cmd);
+
+        /* If we finish allocating bar32 at the first time. */
+        if ( i == nr_bars && bar32_allocating )
+        {
+            /*
+             * We won't repeat to populate more RAM to finalize
+             * allocate all 32bars, so just go to allocate 64bit-bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                bar32_allocating = 0;
+                mmio32_unallocated_total = 0;
+                high_mem_resource.base =
+                        ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
+                goto further_allocate;
+            }
+
+            /* Calculate the remaining 32bars. */
+            for ( n = 0; n < nr_bars ; n++ )
+            {
+                if ( !(bars[n].flag & PCI_BAR_IS_64BIT) )
+                {
+                    uint32_t devfn32, bar_reg32, bar_data32;
+                    uint64_t bar_sz32;
+                    devfn32   = bars[n].devfn;
+                    bar_reg32 = bars[n].bar_reg;
+                    bar_sz32  = bars[n].bar_sz;
+                    bar_data32 = pci_readl(devfn32, bar_reg32);
+                    if ( !bar_data32 )
+                        mmio32_unallocated_total  += bar_sz32;
+                }
+            }
+
+            /*
+             * We have to populate more RAM to further allocate
+             * the remaining 32bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                cur_pci_mem_start = pci_mem_start - mmio32_unallocated_total;
+                relocate_ram_for_pci_memory(cur_pci_mem_start);
+                exp_mem_resource.base = cur_pci_mem_start;
+                exp_mem_resource.max = pci_mem_start;
+            }
+            else
+                bar32_allocating = 0;
+            goto further_allocate;
+        }
     }
 
     if ( pci_hi_mem_start )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (5 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-13 13:35   ` Jan Beulich
  2015-07-15 16:00   ` George Dunlap
  2015-07-09  5:33 ` [v7][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 80 +++++++++++++++++++++++++++++++++--------
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint64_t add_high_mem = 0;
+    uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
     {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
         nr++;
     }
 
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));
+
+    /*
+     * We may need to adjust real lowmem end since we may
+     * populate RAM to get enough MMIO previously.
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+        }
+    }
+
+    /*
+     * And then we also need to adjust highmem.
+     */
+    if ( add_high_mem )
+    {
+        for ( i = 0; i < memory_map.nr_map; i++ )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr > (1ull << 32))
+                e820[i].size += add_high_mem;
+        }
+    }
+
+    /* Finally we need to reorder all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (6 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-09  5:33 ` Tiejun Chen
  2015-07-09  5:34 ` [v7][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4 ~ v7:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (7 preceding siblings ...)
  2015-07-09  5:33 ` [v7][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09  5:34 ` [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, David Scott, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: David Scott <dave.scott@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v6 ~ v7:

* Nothing is changed.

v5:

* Fix the flag field as "0" to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  9 ++++++++-
 tools/libxl/libxl_pci.c             |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 ++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 30 ++++++++++++++++++++----------
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
     domctl.u.assign_device.u.dt.size = size;
+    /*
+     * DT doesn't own any RDM so actually DT has nothing to do
+     * for any flag and here just fix that as 0.
+     */
+    domctl.u.assign_device.flag = 0;
     set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
     rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..a4928c6 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,17 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        /* O means we take "strict" as our default policy. */
+        *flag = 0;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +630,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +663,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +696,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (8 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09  9:20   ` Wei Liu
  2015-07-09 18:02   ` Ian Jackson
  2015-07-09  5:34 ` [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
* Don't expose "ignore" in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 81 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++
 tools/libxl/libxl_create.c   |  7 ++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  9 +++++
 tools/libxl/libxl_types.idl  | 18 ++++++++++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item B<rdm="RDM_RESERVATION_STRING">
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<strategy="STRING">
+
+Currently there is only one valid type:
+
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does exist.
+
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad800000 end_address afffffff
+
+In this conflict case,
+
+#1. If B<strategy> is set to "host", for example,
+
+rdm = "strategy=host,policy=strict" or rdm = "strategy=host,policy=relaxed"
+
+It means all conflicts will be handled according to the policy
+introduced by B<policy> as described below.
+
+#2. If B<strategy> is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=xxxxx' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by B<rdm_policy="STRING"> as described inside pci options.
+
+=item B<policy="STRING">
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+"strict" means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+"relaxed" allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden by rdm_policy option in PCI device configuration.
+
+=back
+
+=back
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -717,6 +790,14 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_policy="STRING">
+
+(HVM/x86 only) This is same as policy option inside the rdm option but
+just specific to a given device. Therefore the default is "relaxed" as
+same as policy option as well.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..88b2102 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "strategy=host, policy=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_policy=relaxed', '03:00.0,rdm_policy=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..f75d4f1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -384,6 +390,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        libxl__rdm_setdefault(gc, b_info);
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d52589e..d397143 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1154,6 +1154,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..b7ab59d 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -988,6 +988,12 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_policy != LIBXL_RDM_RESERVE_POLICY_RELAXED) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
@@ -1040,6 +1046,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if ( pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+        pci->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..47dd83a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+    (0, "ignore"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_policy = Enumeration("rdm_reserve_policy", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_POLICY_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("strategy",    libxl_rdm_reserve_strategy),
+    ("policy",      libxl_rdm_reserve_policy),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -467,6 +483,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        # See libxl_ms_vm_genid_generate()
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
+                                       ("rdm", libxl_rdm_reserve),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
@@ -542,6 +559,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
 libxl_device_dtdev = Struct("device_dtdev", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (9 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09  9:11   ` Wei Liu
  2015-07-09 18:14   ` Ian Jackson
  2015-07-09  5:34 ` [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevint.tian@intel.com>
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c       | 264 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c      |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 298 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..54b67ee 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,270 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+                         uint32_t flag,
+                         uint16_t seg,
+                         uint8_t bus,
+                         uint8_t devfn,
+                         unsigned int *nr_entries,
+                         struct xen_reserved_device_memory **xrdm)
+{
+    int rc = 0, r;
+
+    /*
+     * We really can't presume how many entries we can get in advance.
+     */
+    *nr_entries = 0;
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      NULL, nr_entries);
+    assert(r <= 0);
+    /* "0" means we have no any rdm entry. */
+    if (!r) goto out;
+
+    if (errno != ENOBUFS) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *xrdm = libxl__malloc(gc,
+                          *nr_entries * sizeof(xen_reserved_device_memory_t));
+    r = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                      *xrdm, nr_entries);
+    if (r)
+        rc = ERROR_FAIL;
+
+ out:
+    if (rc) {
+        *nr_entries = 0;
+        *xrdm = NULL;
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+    }
+    return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ * Note when both policies are specified on a given region, the per-device
+ * policy should override the global policy.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict, rc;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint32_t strategy = d_config->b_info.u.hvm.rdm.strategy;
+    uint16_t seg;
+    uint8_t bus, devfn;
+    uint64_t rdm_start, rdm_size;
+    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+
+    /* Might not expose rdm. */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !d_config->num_pcidevs)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                      0, 0, 0, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        if (!nr_entries)
+            return 0;
+
+        assert(xrdm);
+
+        d_config->num_rdms = nr_entries;
+        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+        for (i = 0; i < d_config->num_rdms; i++) {
+            d_config->rdms[i].start =
+                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[i].size =
+                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[i].policy = d_config->b_info.u.hvm.rdm.policy;
+        }
+    } else
+        d_config->num_rdms = 0;
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+        bool new = true;
+
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                      seg, bus, devfn, &nr_entries, &xrdm);
+        if (rc)
+            goto out;
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        assert(xrdm);
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RDMs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RDM entry
+         *
+         * Different policies may be configured on the same RDM due to above
+         * two cases. But we don't allow to assign such a group devies right
+         * now so it doesn't come true in our case.
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start ==
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
+            {
+                /*
+                 * So the per-device policy always override the global policy
+                 * in this case.
+                 */
+                d_config->rdms[j].policy = d_config->pcidevs[i].rdm_policy;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+            d_config->rdms[d_config->num_rdms - 1].start =
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].size =
+                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].policy =
+                                d_config->pcidevs[i].rdm_policy;
+        }
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RDM  in highmem area (>4G)
+     *  - RDM lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RDM entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RDM entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_end - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_end = rdm_start;
+        }
+    }
+
+    /* Sync highmem_end. */
+    args->highmem_end = highmem_end;
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_end,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 args->highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].policy = LIBXL_RDM_RESERVE_POLICY_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bdc0465..f3c39a0 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,13 +914,20 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    /*
+     * Currently we fix this as 2G to guarantte how to handle
+     * our rdm policy. But we'll provide a parameter to set
+     * this dynamically.
+     */
+    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -958,6 +965,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.highmem_end = highmem_end;
     args.mmio_start = mmio_start;
 
+    ret = libxl__domain_device_construct_rdm(gc, d_config,
+                                             rdm_mem_boundary,
+                                             &args);
+    if (ret) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (info->num_vnuma_nodes != 0) {
         int i;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d397143..b4d8419 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1057,7 +1057,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1565,6 +1565,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 47dd83a..a3ad8d1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -562,6 +562,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_policy",      libxl_rdm_reserve_policy),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("policy", libxl_rdm_reserve_policy),
+    ])
+
 libxl_device_dtdev = Struct("device_dtdev", [
     ("path", string),
     ])
@@ -592,6 +598,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (10 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09 18:14   ` Ian Jackson
  2015-07-09  5:34 ` [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable "rdm_mem_boundary_memkb" specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5       | 22 ++++++++++++++++++++++
 tools/libxl/libxl.h         |  6 ++++++
 tools/libxl/libxl_create.c  |  4 ++++
 tools/libxl/libxl_dom.c     |  8 +-------
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is available
 on the XenVGAPassthrough L<http://wiki.xen.org/wiki/XenVGAPassthrough>
 wiki page.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this
+        entry INVALID to indicate we shouldn't expose this entry to
+        hvmloader.
+
+Here the default is 2G.
+
 =item B<dtdev=[ "DTDEV_PATH", "DTDEV_PATH", ... ]>
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
     uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
 {
     if (b_info->u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
         b_info->u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+    if (b_info->u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->u.hvm.rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f3c39a0..62ef120 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
     libxl_domain_build_info *const info = &d_config->b_info;
-    /*
-     * Currently we fix this as 2G to guarantte how to handle
-     * our rdm policy. But we'll provide a parameter to set
-     * this dynamically.
-     */
-    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mmio_start = mmio_start;
 
     ret = libxl__domain_device_construct_rdm(gc, d_config,
-                                             rdm_mem_boundary,
+                                             info->u.hvm.rdm_mem_boundary_memkb*1024,
                                              &args);
     if (ret) {
         LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("ms_vm_genid",      libxl_ms_vm_genid),
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
+                                       ("rdm_mem_boundary_memkb", MemKB),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..dfb50d6 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1519,6 +1519,9 @@ static void parse_config_data(const char *config_source,
                     exit(1);
             }
         }
+
+        if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+            b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (11 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09 18:17   ` Ian Jackson
  2015-07-09  5:34 ` [v7][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c      |  5 +++
 tools/libxl/libxl_internal.h | 24 +++++++++++++
 tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b4d8419..a50449a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
                                     const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+_hidden int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..68bd1d2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args)
+{
+    int rc = 0;
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    struct e820entry *e820 = NULL;
+    uint64_t highmem_size =
+                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+    /* Add all rdm entries. */
+    for (i = 0; i < d_config->num_rdms; i++)
+        if (d_config->rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+            e820_entries++;
+
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+        e820_entries++;
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        if (d_config->rdms[i].policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+out:
+    return rc;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-basic-offset: 4
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 14/16] xen/vtd: enable USB device assignment
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (12 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09  5:34 ` [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v5 ~ v7:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 56f5911..c833290 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2245,11 +2245,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (13 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-13 13:41   ` Jan Beulich
  2015-07-09  5:34 ` [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
  2015-07-10 14:50 ` [v7][PATCH 00/16] Fix RMRR George Dunlap
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index c833290..095fb1d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: in the future we can introduce group device assignment
+     * interface to make sure devices sharing RMRR are assigned to the
+     * same domain together.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn )
+        {
+            if ( rmrr->scope.devices_cnt > 1 )
+            {
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign %04x:%02x:%02x.%u"
+                       " with shared RMRR for Dom%d.\n",
+                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                       d->domain_id);
+                return -EPERM;
+            }
+        }
+    }
+
     ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
     if ( ret )
         return ret;
 
-    seg = pdev->seg;
-    bus = pdev->bus;
-
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (14 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-07-09  5:34 ` Tiejun Chen
  2015-07-09 18:23   ` Ian Jackson
  2015-07-10 14:50 ` [v7][PATCH 00/16] Fix RMRR George Dunlap
  16 siblings, 1 reply; 119+ messages in thread
From: Tiejun Chen @ 2015-07-09  5:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch parses to enable user configurable parameters to specify
RDM resource and according policies,

Global RDM parameter:
    rdm = "strategy=host,policy=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++++++
 3 files changed, 107 insertions(+)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..b8933d2 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_STRATEGY      10
+#define STATE_RESERVE_POLICY    11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdm_policy") ) {
+                    if ( !strcmp(tok, "strict") ) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                    } else if ( !strcmp(tok, "relaxed") ) {
+                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " policy: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
                 }else{
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
@@ -167,6 +181,82 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_STRATEGY;
+                *ptr = '\0';
+                if (strcmp(tok, "strategy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_STRATEGY:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_POLICY;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM strategy option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_POLICY:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "policy")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property policy value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index dfb50d6..38d6c53 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1923,6 +1923,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->u.hvm.rdm.strategy = rdm.strategy;
+            b_info->u.hvm.rdm.policy = rdm.policy;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1937,6 +1945,11 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /*
+             * Like other pci option, the per-device policy always follows
+             * the global policy by default.
+             */
+            pcidev->rdm_policy = b_info->u.hvm.rdm.policy;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-09  5:34 ` [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-07-09  9:11   ` Wei Liu
  2015-07-09  9:41     ` Chen, Tiejun
  2015-07-09 18:14   ` Ian Jackson
  1 sibling, 1 reply; 119+ messages in thread
From: Wei Liu @ 2015-07-09  9:11 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Thu, Jul 09, 2015 at 01:34:02PM +0800, Tiejun Chen wrote:
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RDM.
> 
> RMRR can reside in address space beyond 4G theoretically, but we never
> see this in real world. So in order to avoid breaking highmem layout
> we don't solve highmem conflict. Note this means highmem rmrr could still
> be supported if no conflict.
> 
> But in the case of lowmem, RMRR probably scatter the whole RAM space.
> Especially multiple RMRR entries would worsen this to lead a complicated
> memory layout. And then its hard to extend hvm_info_table{} to work
> hvmloader out. So here we're trying to figure out a simple solution to
> avoid breaking existing layout. So when a conflict occurs,
> 
>     #1. Above a predefined boundary (2G)
>         - move lowmem_end below reserved region to solve conflict;
> 
>     #2. Below a predefined boundary (2G)
>         - Check strict/relaxed policy.
>         "strict" policy leads to fail libxl. Note when both policies
>         are specified on a given region, 'strict' is always preferred.
>         "relaxed" policy issue a warning message and also mask this entry INVALID
>         to indicate we shouldn't expose this entry to hvmloader.
> 
> Note later we need to provide a parameter to set that predefined boundary
> dynamically.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevint.tian@intel.com>

Typo here "kevint"

No need to resend just for this though. I think committer can handle
this for you.

If you happen to resend because of changes in other patches, please
correct this.

Wei.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09  5:34 ` [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-07-09  9:20   ` Wei Liu
  2015-07-09  9:44     ` Chen, Tiejun
  2015-07-09 18:02   ` Ian Jackson
  1 sibling, 1 reply; 119+ messages in thread
From: Wei Liu @ 2015-07-09  9:20 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Thu, Jul 09, 2015 at 01:34:01PM +0800, Tiejun Chen wrote:
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
> 
> Global RDM parameter:
>     rdm = "strategy=host,policy=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_policy=strict/relaxed' ]
> 
> Global RDM parameter, "strategy", allows user to specify reserved regions
> explicitly, Currently, using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. By default this isn't set so we don't
> check all rdms. Instead, we just check rdm specific to a given device if
> you're assigning this kind of device. Note this option is not recommended
> unless you can make sure any conflict does exist.
> 
> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM can't keep running, while 'relaxed' allows moving forward with a
> warning message thrown out.
> 
> Default per-device RDM policy is same as default global RDM policy as being
> 'relaxed'. And the per-device policy would override the global policy like
> others.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

FWIW you shouldn't put Ian and Ian's acks here because they have not
explicitly done so.

No need to resend just for this (Ian and Ian, speak up if you disagree).
They may delete these tags when they commit this patch. But if you do
resend, please delete those two acks.

Wei.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-09  9:11   ` Wei Liu
@ 2015-07-09  9:41     ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-09  9:41 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> Reviewed-by: Kevin Tian <kevint.tian@intel.com>
>
> Typo here "kevint"

Fixed.

>
> No need to resend just for this though. I think committer can handle
> this for you.
>
> If you happen to resend because of changes in other patches, please
> correct this.
>

Sure.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09  9:20   ` Wei Liu
@ 2015-07-09  9:44     ` Chen, Tiejun
  2015-07-09 10:37       ` Ian Jackson
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-09  9:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
>> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> FWIW you shouldn't put Ian and Ian's acks here because they have not
> explicitly done so.

Yes, I'm not very sure this point so I mentioned that at patch #00,

"Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. )..."

>
> No need to resend just for this (Ian and Ian, speak up if you disagree).
> They may delete these tags when they commit this patch. But if you do
> resend, please delete those two acks.
>

Just let me delete them.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09  9:44     ` Chen, Tiejun
@ 2015-07-09 10:37       ` Ian Jackson
  2015-07-09 10:53         ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 10:37 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Chen, Tiejun writes ("Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy"):
> "Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
> Campbell. ( If I'm wrong just let me know at this point. )..."

For future reference, if I am acking something I am always completely
clear about that.  Ambiguous statements, or statements of intent, are
not in themselves acks.

I have not yet acked these patches in this case.  I will review them
later today.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09 10:37       ` Ian Jackson
@ 2015-07-09 10:53         ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-09 10:53 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/7/9 18:37, Ian Jackson wrote:
> Chen, Tiejun writes ("Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy"):
>> "Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
>> Campbell. ( If I'm wrong just let me know at this point. )..."
>
> For future reference, if I am acking something I am always completely
> clear about that.  Ambiguous statements, or statements of intent, are
> not in themselves acks.

Okay I'll keep this in my mind. And as I replied to Wei, I already 
removed those sentences in local tree.

>
> I have not yet acked these patches in this case.  I will review them
> later today.
>

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09  5:34 ` [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
  2015-07-09  9:20   ` Wei Liu
@ 2015-07-09 18:02   ` Ian Jackson
  2015-07-10  0:46     ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 18:02 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy"):
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
...

>  int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
>  {
> +    /* We'd like to force reserve rdm specific to a device by default.*/
> +    if ( pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
           ^

I have just spotted that spurious whitespace.  However I won't block
this for that.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

(actually).

I would appreciate it if you could ensure that this is fixed in any
repost.  You may retain my ack if you do that.  Committers should feel
free to fix it on commit.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-09  5:34 ` [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
  2015-07-09  9:11   ` Wei Liu
@ 2015-07-09 18:14   ` Ian Jackson
  2015-07-10  3:19     ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 18:14 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM"):
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RDM.
...
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevint.tian@intel.com>

I have found a few things in this patch which I would like to see
improved.  See below.

Given how late I am with this review, I do not feel that I should be
nacking it at this time.  You have a tools ack from Wei, so my
comments are not a blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.

Thanks.


> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
> +                                       libxl_domain_config *d_config,
> +                                       uint64_t rdm_mem_boundary,
> +                                       struct xc_hvm_build_args *args)
...
> +    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull\
<<32);
...
> +    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !d_config->num_\
pcidevs)

There are quite a few of these long lines, which should be wrapped.
See tools/libxl/CODING_STYLE.

> +        d_config->num_rdms = nr_entries;
> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +                        d_config->num_rdms * sizeof(libxl_device_rdm));

This code is remarkably similar to a function later on which adds an
rdm.  Please can you factor it out.

> +    } else
> +        d_config->num_rdms = 0;

Please can you put { } around the else block too.  I don't think this
mixed style is good.

> +        for (j = 0; j < d_config->num_rdms; j++) {
> +            if (d_config->rdms[j].start ==
> +                         (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)

This construct
   (uint64_t)some_pfn << XC_PAGE_SHIFT
appears an awful lot.

I would prefer it if it were done in an inline function (or maybe a
macro).


> +    libxl_domain_build_info *const info = &d_config->b_info;
> +    /*
> +     * Currently we fix this as 2G to guarantte how to handle
                                         ^^^^^^^^^

Should read "guarantee".

> +    ret = libxl__domain_device_construct_rdm(gc, d_config,
> +                                             rdm_mem_boundary,
> +                                             &args);
> +    if (ret) {
> +        LOG(ERROR, "checking reserved device memory failed");
> +        goto out;
> +    }

`rc' should be used here rather than `ret'.  (It is unfortunate that
this function has poor style already, but it would be best not to make
it worse.)


Thanks,
Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
  2015-07-09  5:34 ` [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-07-09 18:14   ` Ian Jackson
  0 siblings, 0 replies; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 18:14 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary"):
> Previously we always fix that predefined boundary as 2G to handle
> conflict between memory and rdm, but now this predefined boundar
> can be changes with the parameter "rdm_mem_boundary" in .cfg file.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-09  5:34 ` [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
@ 2015-07-09 18:17   ` Ian Jackson
  2015-07-10  5:40     ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 18:17 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest"):
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist, and hvmloader would need this info
> later.
> 
> Note this guest e820 table would be same as before if the
> platform has no any RDM or we disable RDM (by default).
...
>  tools/libxl/libxl_dom.c      |  5 +++
>  tools/libxl/libxl_internal.h | 24 +++++++++++++
>  tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
...
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 62ef120..41da479 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>          goto out;
>      }
>  
> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
> +        LOG(ERROR, "setting domain memory map failed");
> +        goto out;
> +    }

This is platform-independent code, isn't it ?  In which case this will
break the build on ARM, I think.

Would an ARM maintainer please confirm.

Aside from that I have no issues with this patch.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-09  5:34 ` [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
@ 2015-07-09 18:23   ` Ian Jackson
  2015-07-10  6:05     ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-09 18:23 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v7][PATCH 16/16] tools: parse to enable new rdm policy parameters"):
> This patch parses to enable user configurable parameters to specify
> RDM resource and according policies,
> 
> Global RDM parameter:
>     rdm = "strategy=host,policy=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_policy=strict/relaxed' ]
> 
> Default per-device RDM policy is same as default global RDM policy as being
> 'relaxed'. And the per-device policy would override the global policy like
> others.

Thanks for this.  I have found a couple of things in this patch which
I would like to see improved.  See below.

Again, given how late I am, I do not feel that I should be nacking it
at this time.  You have a tools ack from Wei, so my comments are not a
blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.

Thanks.


The first issue (which would really be relevant to the documentation
patch) is that the documentation is in a separate commit.  There are
sometimes valid reasons for doing this.  I'm not sure if they apply,
but if they do this should be explained in one of the commit
messages.  If this was done I'm afraid I have missed it.

> +                }else if ( !strcmp(optkey, "rdm_policy") ) {
> +                    if ( !strcmp(tok, "strict") ) {
> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
> +                    } else if ( !strcmp(tok, "relaxed") ) {
> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
> +                    } else {
> +                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
> +                                          " policy: 'strict' or 'relaxed'.",
> +                                     tok);
> +                        goto parse_error;
> +                    }

This section has coding style (whitespace) problems and long lines.
If you need to respin, please fix them.

> +    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
> +        switch(state) {
> +        case STATE_TYPE:
> +            if (*ptr == '=') {
> +                state = STATE_RDM_STRATEGY;
> +                *ptr = '\0';
> +                if (strcmp(tok, "strategy")) {
> +                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
> +                    goto parse_error;
> +                }
> +                tok = ptr + 1;
> +            }

This code is extremely repetitive.

Really I would prefer that this parsing was done with a miniature flex
parser, rather than ad-hoc pointer arithmetic and use of strtok.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
  2015-07-09 18:02   ` Ian Jackson
@ 2015-07-10  0:46     ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-10  0:46 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>>   int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
>>   {
>> +    /* We'd like to force reserve rdm specific to a device by default.*/
>> +    if ( pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
>             ^
>
> I have just spotted that spurious whitespace.  However I won't block
> this for that.

Sorry, this is my typo.

>
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
>
> (actually).
>
> I would appreciate it if you could ensure that this is fixed in any
> repost.  You may retain my ack if you do that.  Committers should feel
> free to fix it on commit.

I fixed this in my tree.

Thanks
Tiejun

>
> Thanks,
> Ian.
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-09 18:14   ` Ian Jackson
@ 2015-07-10  3:19     ` Chen, Tiejun
  2015-07-10 10:14       ` Ian Jackson
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-10  3:19 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

> I have found a few things in this patch which I would like to see
> improved.  See below.
>
> Given how late I am with this review, I do not feel that I should be
> nacking it at this time.  You have a tools ack from Wei, so my
> comments are not a blocker for this series.
>
> But if you need to respin, please take these comments into account,
> and consider which are feasible to fix in the time available.  If you
> are respinning this series targeting Xen 4.7 or later, please address
> all of the points I make below.

Thanks for your comments and looks I should address them now.

>
> Thanks.
>
>
>> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
>> +                                       libxl_domain_config *d_config,
>> +                                       uint64_t rdm_mem_boundary,
>> +                                       struct xc_hvm_build_args *args)
> ...
>> +    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull\
> <<32);
> ...
>> +    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !d_config->num_\
> pcidevs)
>
> There are quite a few of these long lines, which should be wrapped.
> See tools/libxl/CODING_STYLE.

Sorry I can't found any case to what you're talking.

So are you saying this?

if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE &&
     !d_config->num_pcidevs)
Or

@@ -143,6 +143,15 @@ static bool overlaps_rdm(uint64_t start, uint64_t 
memsize,
  }

  /*
+ * Check whether any rdm should be exposed..
+ * Returns true if needs, else returns false.
+ */
+static bool exposes_rdm(uint32_t strategy, int num_pcidevs)
+{
+    return strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !num_pcidevs;
+}
+
+/*
   * Check reported RDM regions and handle potential gfn conflicts according
   * to user preferred policy.
   *
@@ -182,7 +191,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
      uint64_t highmem_end = args->highmem_end ? args->highmem_end : 
(1ull<<32);

      /* Might not expose rdm. */
-    if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && 
!d_config->num_pcidevs)
+    if (exposes_rdm(strategy, d_config->num_pcidevs))
          return 0;

      /* Query all RDM entries in this platform */


>
>> +        d_config->num_rdms = nr_entries;
>> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
>> +                        d_config->num_rdms * sizeof(libxl_device_rdm));
>
> This code is remarkably similar to a function later on which adds an
> rdm.  Please can you factor it out.

Do you mean I should merge them as one as possible?

But seems not be possible because we have seveal combinations of these 
two conditions, strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST and one or 
pci devices are also passes through.

#1. strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST but without any devices

So it appropriately needs this libxl__realloc() here. The second 
libxl__realloc() doesn't take any effect.

#2. strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST with one or more devices

Actually we don't need the second libxl__realloc(). This is same as #1.

#3. strategy != LIBXL_RDM_RESERVE_STRATEGY_HOST also with one or more 
devices

So we just need the second libxl__realloc() later. The fist 
libxl__realloc() doesn't be called at all. Especially, its very possible 
we're going to handle less RDMs, compared to 
LIBXL_RDM_RESERVE_STRATEGY_HOST.

>
>> +    } else
>> +        d_config->num_rdms = 0;
>
> Please can you put { } around the else block too.  I don't think this
> mixed style is good.

Fixed.

>
>> +        for (j = 0; j < d_config->num_rdms; j++) {
>> +            if (d_config->rdms[j].start ==
>> +                         (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
>
> This construct
>     (uint64_t)some_pfn << XC_PAGE_SHIFT
> appears an awful lot.
>
> I would prefer it if it were done in an inline function (or maybe a
> macro).

Like this?

#define AAAA(x) ((uint64_t)x << XC_PAGE_SHIFT)

Sorry I can't figure out a good name here :) Any suggestions?

>
>
>> +    libxl_domain_build_info *const info = &d_config->b_info;
>> +    /*
>> +     * Currently we fix this as 2G to guarantte how to handle
>                                           ^^^^^^^^^
>
> Should read "guarantee".

Fixed.

>
>> +    ret = libxl__domain_device_construct_rdm(gc, d_config,
>> +                                             rdm_mem_boundary,
>> +                                             &args);
>> +    if (ret) {
>> +        LOG(ERROR, "checking reserved device memory failed");
>> +        goto out;
>> +    }
>
> `rc' should be used here rather than `ret'.  (It is unfortunate that
> this function has poor style already, but it would be best not to make
> it worse.)
>

I can do this but according to tools/libxl/CODING_STYLE, looks I should 
first post a separate patch to fix this code style issue, and then 
rebase my patch, right?

Thanks
TIejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-09 18:17   ` Ian Jackson
@ 2015-07-10  5:40     ` Chen, Tiejun
  2015-07-10  9:18       ` Ian Campbell
  2015-07-10 10:15       ` Ian Jackson
  0 siblings, 2 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-10  5:40 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>>   tools/libxl/libxl_dom.c      |  5 +++
>>   tools/libxl/libxl_internal.h | 24 +++++++++++++
>>   tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
> ...
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 62ef120..41da479 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>           goto out;
>>       }
>>
>> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
>> +        LOG(ERROR, "setting domain memory map failed");
>> +        goto out;
>> +    }
>
> This is platform-independent code, isn't it ?  In which case this will
> break the build on ARM, I think.
>
> Would an ARM maintainer please confirm.
>

I think you're right. I should make this specific to arch since here 
we're talking e820.

So I tried to refactor this patch,

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
  _hidden
  int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);

+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args);
+
  #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, 
uint32_t domid, int irq)
      return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
  }

+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
+    return 0;
+}
+
  /*
   * Local variables:
   * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..691c1f6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
          goto out;
      }

+    if (libxl__arch_domain_construct_memmap(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
      ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                 &state->store_mfn, state->console_port,
                                 &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, 
uint32_t domid, int irq)
  }

  /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+                                        libxl_domain_config *d_config,
+                                        uint32_t domid,
+                                        struct xc_hvm_build_args *args)
+{
...


> Aside from that I have no issues with this patch.
>

But if you think I should resend this let me know.

Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-09 18:23   ` Ian Jackson
@ 2015-07-10  6:05     ` Chen, Tiejun
  2015-07-10 10:23       ` Ian Jackson
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-10  6:05 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

> The first issue (which would really be relevant to the documentation
> patch) is that the documentation is in a separate commit.  There are
> sometimes valid reasons for doing this.  I'm not sure if they apply,

Wei suggested we should organize/spit all patches according to libxl, 
libxc, xc and xl.

> but if they do this should be explained in one of the commit
> messages.  If this was done I'm afraid I have missed it.

In this patch head description, maybe I can change something like this

This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

>
>> +                }else if ( !strcmp(optkey, "rdm_policy") ) {
>> +                    if ( !strcmp(tok, "strict") ) {
>> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
>> +                    } else if ( !strcmp(tok, "relaxed") ) {
>> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
>> +                    } else {
>> +                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
>> +                                          " policy: 'strict' or 'relaxed'.",
>> +                                     tok);
>> +                        goto parse_error;
>> +                    }
>
> This section has coding style (whitespace) problems and long lines.
> If you need to respin, please fix them.

Are you saying this?

} else if (  -> }else if (
} else { -> }else {

Additionally I don't found which line is over 80 characters.

>
>> +    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
>> +        switch(state) {
>> +        case STATE_TYPE:
>> +            if (*ptr == '=') {
>> +                state = STATE_RDM_STRATEGY;
>> +                *ptr = '\0';
>> +                if (strcmp(tok, "strategy")) {
>> +                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
>> +                    goto parse_error;
>> +                }
>> +                tok = ptr + 1;
>> +            }
>
> This code is extremely repetitive.
>

I just refer to xlu_pci_parse_bdf()

         switch(state) {
         case STATE_DOMAIN:
             if ( *ptr == ':' ) {
                 state = STATE_BUS;
                 *ptr = '\0';
                 if ( hex_convert(tok, &dom, 0xffff) )
                     goto parse_error;
                 tok = ptr + 1;
             }
             break;

> Really I would prefer that this parsing was done with a miniature flex
> parser, rather than ad-hoc pointer arithmetic and use of strtok.

Sorry, could you show this explicitly?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-10  5:40     ` Chen, Tiejun
@ 2015-07-10  9:18       ` Ian Campbell
  2015-07-13  9:47         ` Chen, Tiejun
  2015-07-10 10:15       ` Ian Jackson
  1 sibling, 1 reply; 119+ messages in thread
From: Ian Campbell @ 2015-07-10  9:18 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

On Fri, 2015-07-10 at 13:40 +0800, Chen, Tiejun wrote:
> >>   tools/libxl/libxl_dom.c      |  5 +++
> >>   tools/libxl/libxl_internal.h | 24 +++++++++++++
> >>   tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
> > ...
> >> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> >> index 62ef120..41da479 100644
> >> --- a/tools/libxl/libxl_dom.c
> >> +++ b/tools/libxl/libxl_dom.c
> >> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> >>           goto out;
> >>       }
> >>
> >> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
> >> +        LOG(ERROR, "setting domain memory map failed");
> >> +        goto out;
> >> +    }
> >
> > This is platform-independent code, isn't it ?  In which case this will
> > break the build on ARM, I think.
> >
> > Would an ARM maintainer please confirm.
> >
> 
> I think you're right. I should make this specific to arch since here 
> we're talking e820.
> 
> So I tried to refactor this patch,

This approach looks like it should work, and I think given the point in
the release it would be acceptable for 4.6.

However long term I think it might make sense to try and reuse one of
the existing libxl__arch hooks, i.e.
libxl__arch_domain_init_hw_description or
libxl__arch_domain_finalise_hw_description. On ARM these are to do with
setting the Device Tree Blob, which included the memory map, so it is
somewhat morally equivalent to configuring the e820 on x86, I think.

Those hooks are only called from libxl__build_pv today, but calling them
from libxl__build_hvm seems like it would be good too.

In particular I think a call to
libxl__arch_domain_finalise_hw_description could be inserted just before
xc_hvm_build, which is similar to PV where it precedes
xc_dom_build_image, and is where you would want to setup the e820.

libxl__arch_domain_init_hw_description I think would still be a NOP on
x86, but it should probably go either just after the call to
libxl__domain_firmware.

Tiejun, would you be willing to commit to refactoring this and the
issues which Ian raised in response to #11 and #16 a subsequent clean up
series? I don't think it would even need to wait for the freeze to be
over to be posted (although it may need to wait to be applied).

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-10  3:19     ` Chen, Tiejun
@ 2015-07-10 10:14       ` Ian Jackson
  2015-07-13  9:19         ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-10 10:14 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Chen, Tiejun writes ("Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM"):
> > There are quite a few of these long lines, which should be wrapped.
> > See tools/libxl/CODING_STYLE.
> 
> Sorry I can't found any case to what you're talking.
> 
> So are you saying this?
> 
> if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE &&
>      !d_config->num_pcidevs)
> Or

Yes, I meant `linewrapped', not to make a wrapper function.  Sorry for
not being clear.

> >> +        d_config->num_rdms = nr_entries;
> >> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> >> +                        d_config->num_rdms * sizeof(libxl_device_rdm));
> >
> > This code is remarkably similar to a function later on which adds an
> > rdm.  Please can you factor it out.
> 
> Do you mean I should merge them as one as possible?

"Factor it out" means to break out into a separate function (or maybe
a macro or something, but in this case a function is appropriate).  So
in this case take the two sets of similar code, combine them into a
function with appropriate arguments, and then call that function in
both places.

Finding multiple occurrences of very similar code is usually a sign
that refactoring is needed.

> But seems not be possible because we have seveal combinations of these 
> two conditions, strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST and one or 
> pci devices are also passes through.

I'm not saying you need to merge the two conditions, which are indeed
different, but: the work of reallocing the array and filling in the
new entry could be lifted into a function which would be called in
both places.

> >> +        for (j = 0; j < d_config->num_rdms; j++) {
> >> +            if (d_config->rdms[j].start ==
> >> +                         (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
> >
> > This construct
> >     (uint64_t)some_pfn << XC_PAGE_SHIFT
> > appears an awful lot.
> >
> > I would prefer it if it were done in an inline function (or maybe a
> > macro).
> 
> Like this?
> 
> #define AAAA(x) ((uint64_t)x << XC_PAGE_SHIFT)

Something like that, although inline functions are normally better if
a macro is not required.  And in this case it isn't, so it should be a
function I think.

> Sorry I can't figure out a good name here :) Any suggestions?

The hypervisor seems to call this `pfn_to_paddr'.

> >> +    ret = libxl__domain_device_construct_rdm(gc, d_config,
> >> +                                             rdm_mem_boundary,
> >> +                                             &args);
> >> +    if (ret) {
> >> +        LOG(ERROR, "checking reserved device memory failed");
> >> +        goto out;
> >> +    }
> >
> > `rc' should be used here rather than `ret'.  (It is unfortunate that
> > this function has poor style already, but it would be best not to make
> > it worse.)
> 
> I can do this but according to tools/libxl/CODING_STYLE, looks I should 
> first post a separate patch to fix this code style issue, and then 
> rebase my patch, right?

You are introducing a new use of `ret' rather than `rc'.  AFAICT the
function already has a mixture, and there is no problem with just
using `rc' here.  So I think you do not need to fix the rest of the
function: simply using `ret' rather than `rc' in your added lines
would result in an arrangement which would be correct, and at least as
conformant to the style guide as before.

(Of course if you want to fix the rest of the function then that would
be very welcome, and then you should do it as a separate patch.
However, at this stage before the codefreeze you probably prefer to
avoid taking on anything which is not completely critical.)

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-10  5:40     ` Chen, Tiejun
  2015-07-10  9:18       ` Ian Campbell
@ 2015-07-10 10:15       ` Ian Jackson
  1 sibling, 0 replies; 119+ messages in thread
From: Ian Jackson @ 2015-07-10 10:15 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Chen, Tiejun writes ("Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest"):
> I think you're right. I should make this specific to arch since here 
> we're talking e820.
> 
> So I tried to refactor this patch,

That looks good to me.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-10  6:05     ` Chen, Tiejun
@ 2015-07-10 10:23       ` Ian Jackson
  2015-07-13  9:31         ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-10 10:23 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Chen, Tiejun writes ("Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters"):
> > The first issue (which would really be relevant to the documentation
> > patch) is that the documentation is in a separate commit.  There are
> > sometimes valid reasons for doing this.  I'm not sure if they apply,
> 
> Wei suggested we should organize/spit all patches according to libxl, 
> libxc, xc and xl.

Yes.  I don't want to put you in the middle of a
disagreement/misunderstanding between Wei and me.  This is in any case
a fairly minor issue.

> In this patch head description, maybe I can change something like this
> 
> This patch parses to enable user configurable parameters to specify
> RDM resource and according policies which are defined previously,

That would be good, yes, thanks.

> >> +                }else if ( !strcmp(optkey, "rdm_policy") ) {
> >> +                    if ( !strcmp(tok, "strict") ) {
> >> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
> >> +                    } else if ( !strcmp(tok, "relaxed") ) {
> >> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
> >> +                    } else {
> >> +                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
> >> +                                          " policy: 'strict' or 'relaxed'.",
> >> +                                     tok);
> >> +                        goto parse_error;
> >> +                    }
> >
> > This section has coding style (whitespace) problems and long lines.
> > If you need to respin, please fix them.
> 
> Are you saying this?
> 
> } else if (  -> }else if (
> } else { -> }else {

Also spurious spaces inside brackets.  Please see CODING_STYLE.

> Additionally I don't found which line is over 80 characters.

Hmm, I see that the longest line is exactly 80.

Of course 80 characters, plus a `+' from a patch, and `>' for email
quoting, makes more than 80: so it has wrap damage on my screen.  You
are right that this conforms to the wording in the style, which says
`75-80'.  So I won't insist on this change (unless I manage to
persuade my comaintainers to strenghten this restriction in the style
guide).

> >> +    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
> >> +        switch(state) {
> >> +        case STATE_TYPE:
> >
> > This code is extremely repetitive.
> 
> I just refer to xlu_pci_parse_bdf()

Yes.  I'm afraid that xlu_pci_parse_bdf is a poor example.

> > Really I would prefer that this parsing was done with a miniature flex
> > parser, rather than ad-hoc pointer arithmetic and use of strtok.
> 
> Sorry, could you show this explicitly?

Something like what was done for disk devices.  See libxlu_disk_l.l
for an example.  In this case your code would be a lot less
complicated than what you see there.

After the codefreeze I would probably have some time to write it for
you.  (I think that would be valuable because libxlu_disk_l.l is a
very complicated example, and I want be able to point future
submitters at something simpler.)

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-09  5:33 ` [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-07-10 13:26   ` George Dunlap
  2015-07-10 15:01     ` Jan Beulich
  2015-07-13  6:47     ` Chen, Tiejun
  0 siblings, 2 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-10 13:26 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.
>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v6 ~ v7:
>
> * Nothing is changed.
>
> v5:
>
> * Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
>   "0" means "strict" and "1" means "relaxed".

Thanks for this; a few more comments...

> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>          seg = machine_sbdf >> 16;
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )

This is not a blocker, but a stylistic comment: I would have inverted
the bitmask here, as that's conceptually what you're checking.  I
won't make this a blocker for going in.

> @@ -1898,7 +1899,14 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
> +            /*
> +             * Here means we're add a device to the hardware domain
> +             * so actually RMRR is always reserved on e820 so either
> +             * of flag is fine for hardware domain and here we'd like
> +             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
> +             */

Sorry I didn't give feedback on your comment before.  I don't find it
clear enough; I'd suggest something like this:

"iommu_add_device() is only called for the hardware domain (see
xen/drivers/passthrough/pci.c:pci_add_device()).  Since RMRRs are
always reserved in the e820 map for the hardware domain, there
shouldn't be a conflict."

I also said that if we went with anything other than STRICT that we'd
need to check to make sure that the domain really was the hardware
domain before proceeding, in case the assumption that pdev->domain ==
hardware_domain ever changed.  (Perhaps with an ASSERT -- Jan, what do
you think?)

Also, passing in RELAXED in locations where the flag is completely
ignored (such as when removing mappings) doesn't really make any
sense.

On the whole I think it would be better if you removed the RELAXED
flag for both removals and for hardware domains.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-09  5:33 ` [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-07-10 13:49   ` George Dunlap
  2015-07-13  7:03     ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-10 13:49 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> Now we get this map layout by call XENMEM_memory_map then
> save them into one global variable memory_map[]. It should
> include lowmem range, rdm range and highmem range. Note
> rdm range and highmem range may not exist in some cases.
>
> And here we need to check if any reserved memory conflicts with
> [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
> This range is used to allocate memory in hvmloder level, and
> we would lead hvmloader failed in case of conflict since its
> another rare possibility in real world.
>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> v5 ~ v7:
>
> * Nothing is changed.
>
> v4:
>
> * Move some codes related to e820 to that specific file, e820.c.
>
> * Consolidate "printf()+BUG()" and "BUG_ON()"
>
> * Avoid another fixed width type for the parameter of get_mem_mapping_layout()
>
>  tools/firmware/hvmloader/e820.c      | 35 +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/e820.h      |  7 +++++++
>  tools/firmware/hvmloader/hvmloader.c |  2 ++
>  tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h      | 12 ++++++++++++
>  5 files changed, 82 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index 2e05e93..3e53c47 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -23,6 +23,41 @@
>  #include "config.h"
>  #include "util.h"
>
> +struct e820map memory_map;
> +
> +void memory_map_setup(void)
> +{
> +    unsigned int nr_entries = E820MAX, i;
> +    int rc;
> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;

Why START-1 rather than just START?

It looks like RESERVED_MEMORY_DYNAMIC_START is set to 0xFC001000.  In
the code the way it is, if there is an RMRR from 0xFC000000 of size
0x1000, it looks like check_overlap() below will fail and hvmloader
will BUG().

Is that really what we want?  Why can we not have an RMRR range that
goes right up to the edge of the reserved range?

Other than that this patch looks good.

 -George

> +
> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
> +
> +    if ( rc || !nr_entries )
> +    {
> +        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
> +        BUG();
> +    }
> +
> +    memory_map.nr_map = nr_entries;
> +
> +    for ( i = 0; i < nr_entries; i++ )
> +    {
> +        if ( memory_map.map[i].type == E820_RESERVED )
> +        {
> +            if ( check_overlap(alloc_addr, alloc_size,
> +                               memory_map.map[i].addr,
> +                               memory_map.map[i].size) )
> +            {
> +                printf("Fail to setup memory map due to conflict");
> +                printf(" on dynamic reserved memory range.\n");
> +                BUG();
> +            }
> +        }
> +    }
> +}
> +
>  void dump_e820_table(struct e820entry *e820, unsigned int nr)
>  {
>      uint64_t last_end = 0, start, end;
> diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
> index b2ead7f..8b5a9e0 100644
> --- a/tools/firmware/hvmloader/e820.h
> +++ b/tools/firmware/hvmloader/e820.h
> @@ -15,6 +15,13 @@ struct e820entry {
>      uint32_t type;
>  } __attribute__((packed));
>
> +#define E820MAX        128
> +
> +struct e820map {
> +    unsigned int nr_map;
> +    struct e820entry map[E820MAX];
> +};
> +
>  #endif /* __HVMLOADER_E820_H__ */
>
>  /*
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index 25b7f08..84c588c 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -262,6 +262,8 @@ int main(void)
>
>      init_hypercalls();
>
> +    memory_map_setup();
> +
>      xenbus_setup();
>
>      bios = detect_bios();
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 80d822f..122e3fa 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -27,6 +27,17 @@
>  #include <xen/memory.h>
>  #include <xen/sched.h>
>
> +/*
> + * Check whether there exists overlap in the specified memory range.
> + * Returns true if exists, else returns false.
> + */
> +bool check_overlap(uint64_t start, uint64_t size,
> +                   uint64_t reserved_start, uint64_t reserved_size)
> +{
> +    return (start + size > reserved_start) &&
> +            (start < reserved_start + reserved_size);
> +}
> +
>  void wrmsr(uint32_t idx, uint64_t v)
>  {
>      asm volatile (
> @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
>      *p = '\0';
>  }
>
> +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_memory_map memmap = {
> +        .nr_entries = *max_entries
> +    };
> +
> +    set_xen_guest_handle(memmap.buffer, entries);
> +
> +    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
> +    *max_entries = memmap.nr_entries;
> +
> +    return rc;
> +}
> +
>  void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
>  {
>      static int over_allocated;
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index f99c0f19..1100a3b 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -4,8 +4,10 @@
>  #include <stdarg.h>
>  #include <stdint.h>
>  #include <stddef.h>
> +#include <stdbool.h>
>  #include <xen/xen.h>
>  #include <xen/hvm/hvm_info_table.h>
> +#include "e820.h"
>
>  #define __STR(...) #__VA_ARGS__
>  #define STR(...) __STR(__VA_ARGS__)
> @@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
>  /* Setup PCI bus */
>  void pci_setup(void);
>
> +/* Setup memory map  */
> +void memory_map_setup(void);
> +
>  /* Prepare the 32bit BIOS */
>  uint32_t rombios_highbios_setup(void);
>
> @@ -249,6 +254,13 @@ void perform_tests(void);
>
>  extern char _start[], _end[];
>
> +int get_mem_mapping_layout(struct e820entry entries[],
> +                           unsigned int *max_entries);
> +
> +extern struct e820map memory_map;
> +bool check_overlap(uint64_t start, uint64_t size,
> +                   uint64_t reserved_start, uint64_t reserved_size);
> +
>  #endif /* __HVMLOADER_UTIL_H__ */
>
>  /*
> --
> 1.9.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (15 preceding siblings ...)
  2015-07-09  5:34 ` [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
@ 2015-07-10 14:50 ` George Dunlap
  2015-07-10 14:56   ` Jan Beulich
  2015-07-16  7:55   ` Jan Beulich
  16 siblings, 2 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-10 14:50 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: xen-devel

On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> v7:

It looks like most of the libxl/libxc patches have been acked.  It
seems to me that most of the hypervisor patches (1-3, 14-15) are
either ready to go in or pretty close.

The main thing I think we're missing is the hvmloader stuff (5-7).  Is
that right?

I looked through it for subsets of patches we could usefully check in,
but it looks like the device MMIO range placement in the hvmloader
patches are pretty crucial for proper functioning, even if we're not
doing anything in particular with the memory layout (a la
strategy=host).

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-10 14:50 ` [v7][PATCH 00/16] Fix RMRR George Dunlap
@ 2015-07-10 14:56   ` Jan Beulich
  2015-07-16  7:55   ` Jan Beulich
  1 sibling, 0 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-10 14:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tiejun Chen, xen-devel

>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>> v7:
> 
> It looks like most of the libxl/libxc patches have been acked.  It
> seems to me that most of the hypervisor patches (1-3, 14-15) are
> either ready to go in or pretty close.
> 
> The main thing I think we're missing is the hvmloader stuff (5-7).  Is
> that right?
> 
> I looked through it for subsets of patches we could usefully check in,
> but it looks like the device MMIO range placement in the hvmloader
> patches are pretty crucial for proper functioning, even if we're not
> doing anything in particular with the memory layout (a la
> strategy=host).

Yeah, putting in the hypervisor bits would make little sense without
the hvmloader stuff going in at (about) the same time.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-10 13:26   ` George Dunlap
@ 2015-07-10 15:01     ` Jan Beulich
  2015-07-10 15:07       ` George Dunlap
  2015-07-13  5:57       ` Chen, Tiejun
  2015-07-13  6:47     ` Chen, Tiejun
  1 sibling, 2 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-10 15:01 UTC (permalink / raw)
  To: George Dunlap, Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>>> On 10.07.15 at 15:26, <George.Dunlap@eu.citrix.com> wrote:
> I also said that if we went with anything other than STRICT that we'd
> need to check to make sure that the domain really was the hardware
> domain before proceeding, in case the assumption that pdev->domain ==
> hardware_domain ever changed.  (Perhaps with an ASSERT -- Jan, what do
> you think?)

Yes, such an ASSERT() seems okay/desirable.

> Also, passing in RELAXED in locations where the flag is completely
> ignored (such as when removing mappings) doesn't really make any
> sense.
> 
> On the whole I think it would be better if you removed the RELAXED
> flag for both removals and for hardware domains.

But what would he pass instead? Or wait - iirc I had even suggested
a way to do so by combining two arguments. Would need to go dig
that out, because I think the idea got dropped without good reason.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-10 15:01     ` Jan Beulich
@ 2015-07-10 15:07       ` George Dunlap
  2015-07-13  6:37         ` Chen, Tiejun
  2015-07-13  5:57       ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-10 15:07 UTC (permalink / raw)
  To: Jan Beulich, Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

On 07/10/2015 04:01 PM, Jan Beulich wrote:
>>>> On 10.07.15 at 15:26, <George.Dunlap@eu.citrix.com> wrote:
>> I also said that if we went with anything other than STRICT that we'd
>> need to check to make sure that the domain really was the hardware
>> domain before proceeding, in case the assumption that pdev->domain ==
>> hardware_domain ever changed.  (Perhaps with an ASSERT -- Jan, what do
>> you think?)
> 
> Yes, such an ASSERT() seems okay/desirable.
> 
>> Also, passing in RELAXED in locations where the flag is completely
>> ignored (such as when removing mappings) doesn't really make any
>> sense.
>>
>> On the whole I think it would be better if you removed the RELAXED
>> flag for both removals and for hardware domains.
> 
> But what would he pass instead? Or wait - iirc I had even suggested
> a way to do so by combining two arguments. Would need to go dig
> that out, because I think the idea got dropped without good reason.

No, I just meant to pass '0' for the flags (which would imply STRICT).

I was saying two things in the above paragraph:

1. For removal, there's no point in passing in anything other than '0'
for flags, since it's ignored.  Passing a non-0 value implies that the
flags will have some effect, which is misleading.

2. For places we know we're adding to hw domains, I think it makes most
sense also to pass in '0', to imply STRICT.

But if instead they insist on passing RELAXED, then please add an
ASSERT(pdev->domain == hw_domain) or something of the kind to
intel_iommu_add_device().  (If defaulting to STRICT, I don't think the
ASSERT is necessary anymore.)

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-10 15:01     ` Jan Beulich
  2015-07-10 15:07       ` George Dunlap
@ 2015-07-13  5:57       ` Chen, Tiejun
  1 sibling, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  5:57 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>> Also, passing in RELAXED in locations where the flag is completely
>> ignored (such as when removing mappings) doesn't really make any
>> sense.
>>
>> On the whole I think it would be better if you removed the RELAXED
>> flag for both removals and for hardware domains.
>
> But what would he pass instead? Or wait - iirc I had even suggested
> a way to do so by combining two arguments. Would need to go dig
> that out, because I think the idea got dropped without good reason.
>

No, I don't drop this directly.

http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01877.html

I went there with one optional way you provided that I just need to a 
brief comment. And I also had reply at that moment.

http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg02101.html

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-10 15:07       ` George Dunlap
@ 2015-07-13  6:37         ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  6:37 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

> I was saying two things in the above paragraph:
>
> 1. For removal, there's no point in passing in anything other than '0'
> for flags, since it's ignored.  Passing a non-0 value implies that the
> flags will have some effect, which is misleading.
>
> 2. For places we know we're adding to hw domains, I think it makes most
> sense also to pass in '0', to imply STRICT.
>
> But if instead they insist on passing RELAXED, then please add an
> ASSERT(pdev->domain == hw_domain) or something of the kind to
> intel_iommu_add_device().  (If defaulting to STRICT, I don't think the
> ASSERT is necessary anymore.)
>

I agree and also looks Jan didn't oppose this STRICT way by setting "0" 
directly, so lets do this.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-10 13:26   ` George Dunlap
  2015-07-10 15:01     ` Jan Beulich
@ 2015-07-13  6:47     ` Chen, Tiejun
  2015-07-13  8:57       ` Jan Beulich
  2015-07-14 10:46       ` George Dunlap
  1 sibling, 2 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  6:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

> Thanks for this; a few more comments...
>

Thanks for your time.

>> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>>           seg = machine_sbdf >> 16;
>>           bus = PCI_BUS(machine_sbdf);
>>           devfn = PCI_DEVFN2(machine_sbdf);
>> +        flag = domctl->u.assign_device.flag;
>> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
>
> This is not a blocker, but a stylistic comment: I would have inverted
> the bitmask here, as that's conceptually what you're checking.  I
> won't make this a blocker for going in.

What about this?

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 6e23fc6..17a4206 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1579,7 +1579,7 @@ int iommu_do_pci_domctl(
          bus = PCI_BUS(machine_sbdf);
          devfn = PCI_DEVFN2(machine_sbdf);
          flag = domctl->u.assign_device.flag;
-        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
+        if ( flag & ~XEN_DOMCTL_DEV_RDM_MASK )
          {
              ret = -EINVAL;
              break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bca25c9..07549a4 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -480,6 +480,7 @@ struct xen_domctl_assign_device {
      } u;
      /* IN */
  #define XEN_DOMCTL_DEV_RDM_RELAXED      1
+#define XEN_DOMCTL_DEV_RDM_MASK         0x1
      uint32_t  flag;   /* flag of assigned device */
  };
  typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;

>
>> @@ -1898,7 +1899,14 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>>                PCI_BUS(bdf) == pdev->bus &&
>>                PCI_DEVFN2(bdf) == devfn )
>>           {
>> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>> +            /*
>> +             * Here means we're add a device to the hardware domain
>> +             * so actually RMRR is always reserved on e820 so either
>> +             * of flag is fine for hardware domain and here we'd like
>> +             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
>> +             */
>
> Sorry I didn't give feedback on your comment before.  I don't find it
> clear enough; I'd suggest something like this:
>
> "iommu_add_device() is only called for the hardware domain (see
> xen/drivers/passthrough/pci.c:pci_add_device()).  Since RMRRs are
> always reserved in the e820 map for the hardware domain, there
> shouldn't be a conflict."

Loos good and thanks.

>
> I also said that if we went with anything other than STRICT that we'd
> need to check to make sure that the domain really was the hardware
> domain before proceeding, in case the assumption that pdev->domain ==
> hardware_domain ever changed.  (Perhaps with an ASSERT -- Jan, what do
> you think?)

Sounds reasonable.

>
> Also, passing in RELAXED in locations where the flag is completely
> ignored (such as when removing mappings) doesn't really make any
> sense.
>
> On the whole I think it would be better if you removed the RELAXED
> flag for both removals and for hardware domains.
>

Just as I said in another email I agreed your STRICT way.

Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
  2015-07-10 13:49   ` George Dunlap
@ 2015-07-13  7:03     ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  7:03 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On 2015/7/10 21:49, George Dunlap wrote:
> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>> Now we get this map layout by call XENMEM_memory_map then
>> save them into one global variable memory_map[]. It should
>> include lowmem range, rdm range and highmem range. Note
>> rdm range and highmem range may not exist in some cases.
>>
>> And here we need to check if any reserved memory conflicts with
>> [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
>> This range is used to allocate memory in hvmloder level, and
>> we would lead hvmloader failed in case of conflict since its
>> another rare possibility in real world.
>>
>> CC: Keir Fraser <keir@xen.org>
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>> v5 ~ v7:
>>
>> * Nothing is changed.
>>
>> v4:
>>
>> * Move some codes related to e820 to that specific file, e820.c.
>>
>> * Consolidate "printf()+BUG()" and "BUG_ON()"
>>
>> * Avoid another fixed width type for the parameter of get_mem_mapping_layout()
>>
>>   tools/firmware/hvmloader/e820.c      | 35 +++++++++++++++++++++++++++++++++++
>>   tools/firmware/hvmloader/e820.h      |  7 +++++++
>>   tools/firmware/hvmloader/hvmloader.c |  2 ++
>>   tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
>>   tools/firmware/hvmloader/util.h      | 12 ++++++++++++
>>   5 files changed, 82 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
>> index 2e05e93..3e53c47 100644
>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -23,6 +23,41 @@
>>   #include "config.h"
>>   #include "util.h"
>>
>> +struct e820map memory_map;
>> +
>> +void memory_map_setup(void)
>> +{
>> +    unsigned int nr_entries = E820MAX, i;
>> +    int rc;
>> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
>> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
>
> Why START-1 rather than just START?

I also think this is wrong after I double check this point. This two 
lines seems be copied simply from another place where we're allocating 
space based on RESERVED_MEMORY_DYNAMIC_{START, END}. But here I think 
you're right.

So let me correct this and update the patch description.

Thanks
Tiejun

>
> It looks like RESERVED_MEMORY_DYNAMIC_START is set to 0xFC001000.  In
> the code the way it is, if there is an RMRR from 0xFC000000 of size
> 0x1000, it looks like check_overlap() below will fail and hvmloader
> will BUG().
>
> Is that really what we want?  Why can we not have an RMRR range that
> goes right up to the edge of the reserved range?
>
> Other than that this patch looks good.
>
>   -George
>
>> +
>> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
>> +
>> +    if ( rc || !nr_entries )
>> +    {
>> +        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
>> +        BUG();
>> +    }
>> +
>> +    memory_map.nr_map = nr_entries;
>> +
>> +    for ( i = 0; i < nr_entries; i++ )
>> +    {
>> +        if ( memory_map.map[i].type == E820_RESERVED )
>> +        {
>> +            if ( check_overlap(alloc_addr, alloc_size,
>> +                               memory_map.map[i].addr,
>> +                               memory_map.map[i].size) )
>> +            {
>> +                printf("Fail to setup memory map due to conflict");
>> +                printf(" on dynamic reserved memory range.\n");
>> +                BUG();
>> +            }
>> +        }
>> +    }
>> +}
>> +
>>   void dump_e820_table(struct e820entry *e820, unsigned int nr)
>>   {
>>       uint64_t last_end = 0, start, end;
>> diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
>> index b2ead7f..8b5a9e0 100644
>> --- a/tools/firmware/hvmloader/e820.h
>> +++ b/tools/firmware/hvmloader/e820.h
>> @@ -15,6 +15,13 @@ struct e820entry {
>>       uint32_t type;
>>   } __attribute__((packed));
>>
>> +#define E820MAX        128
>> +
>> +struct e820map {
>> +    unsigned int nr_map;
>> +    struct e820entry map[E820MAX];
>> +};
>> +
>>   #endif /* __HVMLOADER_E820_H__ */
>>
>>   /*
>> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
>> index 25b7f08..84c588c 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -262,6 +262,8 @@ int main(void)
>>
>>       init_hypercalls();
>>
>> +    memory_map_setup();
>> +
>>       xenbus_setup();
>>
>>       bios = detect_bios();
>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>> index 80d822f..122e3fa 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -27,6 +27,17 @@
>>   #include <xen/memory.h>
>>   #include <xen/sched.h>
>>
>> +/*
>> + * Check whether there exists overlap in the specified memory range.
>> + * Returns true if exists, else returns false.
>> + */
>> +bool check_overlap(uint64_t start, uint64_t size,
>> +                   uint64_t reserved_start, uint64_t reserved_size)
>> +{
>> +    return (start + size > reserved_start) &&
>> +            (start < reserved_start + reserved_size);
>> +}
>> +
>>   void wrmsr(uint32_t idx, uint64_t v)
>>   {
>>       asm volatile (
>> @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
>>       *p = '\0';
>>   }
>>
>> +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
>> +{
>> +    int rc;
>> +    struct xen_memory_map memmap = {
>> +        .nr_entries = *max_entries
>> +    };
>> +
>> +    set_xen_guest_handle(memmap.buffer, entries);
>> +
>> +    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
>> +    *max_entries = memmap.nr_entries;
>> +
>> +    return rc;
>> +}
>> +
>>   void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
>>   {
>>       static int over_allocated;
>> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
>> index f99c0f19..1100a3b 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -4,8 +4,10 @@
>>   #include <stdarg.h>
>>   #include <stdint.h>
>>   #include <stddef.h>
>> +#include <stdbool.h>
>>   #include <xen/xen.h>
>>   #include <xen/hvm/hvm_info_table.h>
>> +#include "e820.h"
>>
>>   #define __STR(...) #__VA_ARGS__
>>   #define STR(...) __STR(__VA_ARGS__)
>> @@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
>>   /* Setup PCI bus */
>>   void pci_setup(void);
>>
>> +/* Setup memory map  */
>> +void memory_map_setup(void);
>> +
>>   /* Prepare the 32bit BIOS */
>>   uint32_t rombios_highbios_setup(void);
>>
>> @@ -249,6 +254,13 @@ void perform_tests(void);
>>
>>   extern char _start[], _end[];
>>
>> +int get_mem_mapping_layout(struct e820entry entries[],
>> +                           unsigned int *max_entries);
>> +
>> +extern struct e820map memory_map;
>> +bool check_overlap(uint64_t start, uint64_t size,
>> +                   uint64_t reserved_start, uint64_t reserved_size);
>> +
>>   #endif /* __HVMLOADER_UTIL_H__ */
>>
>>   /*
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-13  6:47     ` Chen, Tiejun
@ 2015-07-13  8:57       ` Jan Beulich
  2015-07-14 10:46       ` George Dunlap
  1 sibling, 0 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-13  8:57 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, George Dunlap,
	Andrew Cooper, Tim Deegan, xen-devel, Stefano Stabellini,
	Suravee Suthikulpanit, Yang Zhang, Aravind Gopalakrishnan

>>> On 13.07.15 at 08:47, <tiejun.chen@intel.com> wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -480,6 +480,7 @@ struct xen_domctl_assign_device {
>       } u;
>       /* IN */
>   #define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +#define XEN_DOMCTL_DEV_RDM_MASK         0x1

As said before - I dislike this mask being made part of the public
interface, albeit it being a domctl thing makes it a minor issue.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
  2015-07-10 10:14       ` Ian Jackson
@ 2015-07-13  9:19         ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  9:19 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> Do you mean I should merge them as one as possible?
>
> "Factor it out" means to break out into a separate function (or maybe
> a macro or something, but in this case a function is appropriate).  So
> in this case take the two sets of similar code, combine them into a
> function with appropriate arguments, and then call that function in
> both places.
>
> Finding multiple occurrences of very similar code is usually a sign
> that refactoring is needed.
>

Thanks for you explanation.

>> But seems not be possible because we have seveal combinations of these
>> two conditions, strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST and one or
>> pci devices are also passes through.
>

[snip]

>> Sorry I can't figure out a good name here :) Any suggestions?
>
> The hypervisor seems to call this `pfn_to_paddr'.

Okay.


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-10 10:23       ` Ian Jackson
@ 2015-07-13  9:31         ` Chen, Tiejun
  2015-07-13  9:40           ` Ian Campbell
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  9:31 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>>>> +                }else if ( !strcmp(optkey, "rdm_policy") ) {
>>>> +                    if ( !strcmp(tok, "strict") ) {
>>>> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
>>>> +                    } else if ( !strcmp(tok, "relaxed") ) {
>>>> +                        pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
>>>> +                    } else {
>>>> +                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
>>>> +                                          " policy: 'strict' or 'relaxed'.",
>>>> +                                     tok);
>>>> +                        goto parse_error;
>>>> +                    }
>>>
>>> This section has coding style (whitespace) problems and long lines.
>>> If you need to respin, please fix them.
>>
>> Are you saying this?
>>
>> } else if (  -> }else if (
>> } else { -> }else {
>
> Also spurious spaces inside brackets.  Please see CODING_STYLE.

I still can't understand what I'm missing here after compared to other 
contexts inside xlu_pci_parse_bdf(). So I have to paste this entirely,

                 }else if ( !strcmp(optkey, "rdm_policy") ) {
                     if ( !strcmp(tok, "strict") ) {
                         pcidev->rdm_policy = 
LIBXL_RDM_RESERVE_POLICY_STRICT;
                     }else if ( !strcmp(tok, "relaxed") ) {
                         pcidev->rdm_policy = 
LIBXL_RDM_RESERVE_POLICY_RELAXED;
                     }else{
                         XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM 
property"
                                           " policy: 'strict' or 
'relaxed'.",
                                      tok);
                         goto parse_error;
                     }
                 }else{

This is not a long code segment, so could you point them just one by one?

>
>> Additionally I don't found which line is over 80 characters.
>

[snip]

>>> Really I would prefer that this parsing was done with a miniature flex
>>> parser, rather than ad-hoc pointer arithmetic and use of strtok.
>>
>> Sorry, could you show this explicitly?
>
> Something like what was done for disk devices.  See libxlu_disk_l.l
> for an example.  In this case your code would be a lot less
> complicated than what you see there.
>
> After the codefreeze I would probably have some time to write it for

Sounds yourself would do this so currently I just keep the original, right?

Thanks
Tiejun

> you.  (I think that would be valuable because libxlu_disk_l.l is a
> very complicated example, and I want be able to point future
> submitters at something simpler.)
>
> Ian.
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-13  9:31         ` Chen, Tiejun
@ 2015-07-13  9:40           ` Ian Campbell
  2015-07-13  9:55             ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Campbell @ 2015-07-13  9:40 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Wei Liu, xen-devel, Ian Jackson, Stefano Stabellini

On Mon, 2015-07-13 at 17:31 +0800, Chen, Tiejun wrote:
> I still can't understand what I'm missing here after compared to other 
> contexts inside xlu_pci_parse_bdf().

Perhaps comparing to the CODING_STYLE document would help?

>  So I have to paste this entirely,
> 
>                  }else if ( !strcmp(optkey, "rdm_policy") ) {

Should be:
                 } else if (!strcmp(optkey, "rdm_policy")) {

i.e. space after } before "else" and no extra spaces inside the if
condition.

>                      if ( !strcmp(tok, "strict") ) {

                     if (!strcmp(tok, "strict")) {

Again no spaces within the if.

>                          pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
>                      }else if ( !strcmp(tok, "relaxed") ) {

Again add a space after } and remove those inside the if condition.

>                          pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
>                      }else{

Should be:
                     } else {

>                          XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM 
> property"
>                                            " policy: 'strict' or 
> 'relaxed'.",
>                                       tok);
>                          goto parse_error;
>                      }
>                  }else{

and again "} else {"

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-10  9:18       ` Ian Campbell
@ 2015-07-13  9:47         ` Chen, Tiejun
  2015-07-13 10:15           ` Ian Campbell
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  9:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

> This approach looks like it should work, and I think given the point in
> the release it would be acceptable for 4.6.
>
> However long term I think it might make sense to try and reuse one of
> the existing libxl__arch hooks, i.e.
> libxl__arch_domain_init_hw_description or
> libxl__arch_domain_finalise_hw_description. On ARM these are to do with
> setting the Device Tree Blob, which included the memory map, so it is
> somewhat morally equivalent to configuring the e820 on x86, I think.
>
> Those hooks are only called from libxl__build_pv today, but calling them
> from libxl__build_hvm seems like it would be good too.

But seems this is raising some potential risks, isn't this? Although 
libxl__arch_domain_init_hw_description() and 
libxl__arch_domain_finalise_hw_description() are NOP to x86, they're 
really working on ARM side. So if we call them inside 
libxl__build_hvm(), any affects to ARM? I'm not very sure at this point 
unless anyone can validate this change on ARM, or you really ensure my 
concerns is unnecessary.

>
> In particular I think a call to
> libxl__arch_domain_finalise_hw_description could be inserted just before
> xc_hvm_build, which is similar to PV where it precedes
> xc_dom_build_image, and is where you would want to setup the e820.
>
> libxl__arch_domain_init_hw_description I think would still be a NOP on
> x86, but it should probably go either just after the call to
> libxl__domain_firmware.
>
> Tiejun, would you be willing to commit to refactoring this and the
> issues which Ian raised in response to #11 and #16 a subsequent clean up
> series? I don't think it would even need to wait for the freeze to be
> over to be posted (although it may need to wait to be applied).
>

Yes, I'd like to follow this once my concerns above can be eliminated.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-13  9:40           ` Ian Campbell
@ 2015-07-13  9:55             ` Chen, Tiejun
  2015-07-13 10:17               ` Ian Campbell
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-13  9:55 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, xen-devel, Ian Jackson, Stefano Stabellini

On 2015/7/13 17:40, Ian Campbell wrote:
> On Mon, 2015-07-13 at 17:31 +0800, Chen, Tiejun wrote:
>> I still can't understand what I'm missing here after compared to other
>> contexts inside xlu_pci_parse_bdf().
>
> Perhaps comparing to the CODING_STYLE document would help?

Looks the whole xlu_pci_parse_bdf() doesn't follow that,

                 if ( !strcmp(optkey, "msitranslate") ) {
                     pcidev->msitranslate = atoi(tok);
                 }else if ( !strcmp(optkey, "power_mgmt") ) {
                     pcidev->power_mgmt = atoi(tok);
                 }else if ( !strcmp(optkey, "permissive") ) {
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
                 }else if ( !strcmp(optkey, "rdm_policy") ) {

So I can do this as you're expecting now, but seems our change would 
make the code style very inconsistent inside this function.

Thanks
Tiejun


>
>>   So I have to paste this entirely,
>>
>>                   }else if ( !strcmp(optkey, "rdm_policy") ) {
>
> Should be:
>                   } else if (!strcmp(optkey, "rdm_policy")) {
>
> i.e. space after } before "else" and no extra spaces inside the if
> condition.
>
>>                       if ( !strcmp(tok, "strict") ) {
>
>                       if (!strcmp(tok, "strict")) {
>
> Again no spaces within the if.
>
>>                           pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
>>                       }else if ( !strcmp(tok, "relaxed") ) {
>
> Again add a space after } and remove those inside the if condition.
>
>>                           pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
>>                       }else{
>
> Should be:
>                       } else {
>
>>                           XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM
>> property"
>>                                             " policy: 'strict' or
>> 'relaxed'.",
>>                                        tok);
>>                           goto parse_error;
>>                       }
>>                   }else{
>
> and again "} else {"
>
> Ian.
>
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-13  9:47         ` Chen, Tiejun
@ 2015-07-13 10:15           ` Ian Campbell
  2015-07-14  5:44             ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Campbell @ 2015-07-13 10:15 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

On Mon, 2015-07-13 at 17:47 +0800, Chen, Tiejun wrote:
> > This approach looks like it should work, and I think given the point in
> > the release it would be acceptable for 4.6.
> >
> > However long term I think it might make sense to try and reuse one of
> > the existing libxl__arch hooks, i.e.
> > libxl__arch_domain_init_hw_description or
> > libxl__arch_domain_finalise_hw_description. On ARM these are to do with
> > setting the Device Tree Blob, which included the memory map, so it is
> > somewhat morally equivalent to configuring the e820 on x86, I think.
> >
> > Those hooks are only called from libxl__build_pv today, but calling them
> > from libxl__build_hvm seems like it would be good too.
> 
> But seems this is raising some potential risks, isn't this? Although 
> libxl__arch_domain_init_hw_description() and 
> libxl__arch_domain_finalise_hw_description() are NOP to x86, they're 
> really working on ARM side. So if we call them inside 
> libxl__build_hvm(), any affects to ARM? I'm not very sure at this point 
> unless anyone can validate this change on ARM, or you really ensure my 
> concerns is unnecessary.

All ARM guests use the PV code path so there is no risk.

If there was some change to ARM to introduce an HVM style guest then it
would want those hooks called in this place too (and they would need
fixing as part of implementing such a thing).

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-13  9:55             ` Chen, Tiejun
@ 2015-07-13 10:17               ` Ian Campbell
  2015-07-13 17:08                 ` Ian Jackson
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Campbell @ 2015-07-13 10:17 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Wei Liu, xen-devel, Ian Jackson, Stefano Stabellini

On Mon, 2015-07-13 at 17:55 +0800, Chen, Tiejun wrote:
> On 2015/7/13 17:40, Ian Campbell wrote:
> > On Mon, 2015-07-13 at 17:31 +0800, Chen, Tiejun wrote:
> >> I still can't understand what I'm missing here after compared to other
> >> contexts inside xlu_pci_parse_bdf().
> >
> > Perhaps comparing to the CODING_STYLE document would help?
> 
> Looks the whole xlu_pci_parse_bdf() doesn't follow that,
> 
>                  if ( !strcmp(optkey, "msitranslate") ) {
>                      pcidev->msitranslate = atoi(tok);
>                  }else if ( !strcmp(optkey, "power_mgmt") ) {
>                      pcidev->power_mgmt = atoi(tok);
>                  }else if ( !strcmp(optkey, "permissive") ) {
>                      pcidev->permissive = atoi(tok);
>                  }else if ( !strcmp(optkey, "seize") ) {
>                      pcidev->seize = atoi(tok);
>                  }else if ( !strcmp(optkey, "rdm_policy") ) {
> 
> So I can do this as you're expecting now, but seems our change would 
> make the code style very inconsistent inside this function.

I think one could make an argument that the exception described in the
first section of tools/libxl/CODING_STYLE applies here for the
whitespace issues, but not for the long lines I think.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-09  5:33 ` [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-07-13 13:12   ` Jan Beulich
  2015-07-14  6:39     ` Chen, Tiejun
  2015-07-15 13:40     ` George Dunlap
  0 siblings, 2 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-13 13:12 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 09.07.15 at 07:33, <tiejun.chen@intel.com> wrote:
> @@ -50,17 +75,22 @@ void pci_setup(void)
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
>          uint64_t base, max;
> -    } *resource, mem_resource, high_mem_resource, io_resource;
> +    } *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource;

Despite having gone through description and the rest of the patch I
can't seem to be able to guess what "exp_mem" stands for.
Meaningful variable names are quite helpful though, often avoiding
the need for comments.

>      /* Create a list of device BARs in descending order of size. */
>      struct bars {
> -        uint32_t is_64bar;
> +#define PCI_BAR_IS_64BIT        0x1
> +#define PCI_BAR_IS_ALLOCATED    0x2
> +        uint32_t flag;

flags (you already have two)

>          uint32_t devfn;
>          uint32_t bar_reg;
>          uint64_t bar_sz;
>      } *bars = (struct bars *)scratch_start;
> -    unsigned int i, nr_bars = 0;
> -    uint64_t mmio_hole_size = 0;
> +    unsigned int i, j, n, nr_bars = 0;
> +    uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
> +    bool bar32_allocating = 0;
> +    uint64_t mmio32_unallocated_total = 0;
> +    unsigned long cur_pci_mem_start = 0;
>  
>      const char *s;
>      /*
> @@ -222,7 +252,7 @@ void pci_setup(void)
>              if ( i != nr_bars )
>                  memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>  
> -            bars[i].is_64bar = is_64bar;
> +            bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0;
>              bars[i].devfn   = devfn;
>              bars[i].bar_reg = bar_reg;
>              bars[i].bar_sz  = bar_sz;
> @@ -309,29 +339,31 @@ void pci_setup(void)
>      }
>  
>      /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
> +    cur_pci_mem_start = pci_mem_start;
>      while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
> +        relocate_ram_for_pci_memory(cur_pci_mem_start);

Please be consistent which variable to want to use in the loop
(pci_mem_start vs cur_pci_mem_start).

Also, this being the first substantial change to the function makes
clear that you _still_ leave the sizing loop untouched, and instead
make the allocation logic below more complicated. I said before a
number of times that I don't think this helps maintainability of this
already convoluted code. Among other things this manifests itself
in your second call to relocate_ram_for_pci_memory() in no way
playing by the constraints explained a few lines up from here in an
extensive comment.

Therefore I'll not make any further comments on the rest of the
patch, but instead outline an allocation model that I think would
fit our needs: Subject to the constraints mentioned above, set up
a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
bits], i.e. reasonably small a memory block). Each bit represents a
page usable for MMIO: First of all you remove the range from
PCI_MEM_END upwards. Then remove all RDM pages. Now do a
first pass over all devices, allocating (in the bitmap) space for only
the 32-bit MMIO BARs, starting with the biggest one(s), by finding
a best fit (i.e. preferably a range not usable by any bigger BAR)
from top down. For example, if you have available

[f0000000,f8000000)
[f9000000,f9001000)
[fa000000,fa003000)
[fa010000,fa012000)

and you're looking for a single page slot, you should end up
picking fa002000.

After this pass you should be able to do RAM relocation in a
single attempt just like we do today (you may still grow the MMIO
window if you know you need to and can fit some of the 64-bit
BARs in there, subject to said constraints; this is in an attempt
to help OSes not comfortable with 64-bit resources).

In a 2nd pass you'd then assign 64-bit resources: If you can fit
them below 4G (you still have the bitmap left of what you've got
available), put them there. Allocation strategy could be the same
as above (biggest first), perhaps allowing for some factoring out
of logic, but here smallest first probably could work equally well.
The main thought to decide between the two is whether it is
better to fit as many (small) or as big (in total) as possible a set
under 4G. I'd generally expect the former (as many as possible,
leaving only a few huge ones to go above 4G) to be the better
approach, but that's more a gut feeling than based on hard data.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-09  5:33 ` [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-07-13 13:35   ` Jan Beulich
  2015-07-14  5:22     ` Chen, Tiejun
  2015-07-15 16:00   ` George Dunlap
  1 sibling, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-13 13:35 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 09.07.15 at 07:33, <tiejun.chen@intel.com> wrote:
> Now we can use that memory map to build our final
> e820 table but it may need to reorder all e820
> entries.

"it" being what? I'm afraid I can't really make sense of the second
half of the sentence...

> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint64_t add_high_mem = 0;
> +    uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;

For the last one I don't see why uint64_t; uint32_t should be just fine
and less (binary) code.

> @@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
>  
> -
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Construct E820 table according to recorded memory map.
> +     *
> +     * The memory map created by toolstack may include,
> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. Reserved regions if they exist
> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
>      {
> -        e820[nr].addr = ((uint64_t)1 << 32);
> -        e820[nr].size =
> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -        e820[nr].type = E820_RAM;
> +        e820[nr] = memory_map.map[i];
>          nr++;
>      }
>  
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_end < (2u << 20));
> +
> +    /*
> +     * We may need to adjust real lowmem end since we may
> +     * populate RAM to get enough MMIO previously.

"populate"? Don't you mean "relocate"?

> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
> +    {
> +        uint64_t end = e820[i].addr + e820[i].size;

Either loop index/boundary or used array are wrong here: In the
earlier loop you copied memory_map[0...nr_map-1] to
e820[n...n+nr_map-1], but here you're looping looking at
e820[0...nr_map-1]

> +        if ( e820[i].type == E820_RAM &&
> +             low_mem_end > e820[i].addr && low_mem_end < end )

Assuming you mean to look at the RDM e820[] entries here, this
is not a correct check: You don't care about partly or fully
contained, all you care about is whether low_mem_end extends
beyond the start of the region.

> +        {
> +            add_high_mem = end - low_mem_end;
> +            e820[i].size = low_mem_end - e820[i].addr;
> +        }
> +    }
> +
> +    /*
> +     * And then we also need to adjust highmem.
> +     */

A single line comment should use the respective comment style.

> +    if ( add_high_mem )
> +    {
> +        for ( i = 0; i < memory_map.nr_map; i++ )
> +        {
> +            if ( e820[i].type == E820_RAM &&
> +                 e820[i].addr > (1ull << 32))
> +                e820[i].size += add_high_mem;
> +        }
> +    }

But looking at the code I think the comment should be extended to
state that we currently expect there to be exactly one such RAM
region.

> +    /* Finally we need to reorder all e820 entries. */

"reorder"? Perhaps "sort"?

But despite the many comments - the patch looks a lot better now
than earlier versions.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-09  5:34 ` [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-07-13 13:41   ` Jan Beulich
  2015-07-14  1:42     ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-13 13:41 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, xen-devel

>>> On 09.07.15 at 07:34, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
>      if ( list_empty(&acpi_drhd_units) )
>          return -ENODEV;
>  
> +    seg = pdev->seg;
> +    bus = pdev->bus;
> +    /*
> +     * In rare cases one given rmrr is shared by multiple devices but
> +     * obviously this would put the security of a system at risk. So
> +     * we should prevent from this sort of device assignment.
> +     *
> +     * TODO: in the future we can introduce group device assignment
> +     * interface to make sure devices sharing RMRR are assigned to the
> +     * same domain together.
> +     */
> +    for_each_rmrr_device( rmrr, bdf, i )
> +    {
> +        if ( rmrr->segment == seg &&
> +             PCI_BUS(bdf) == bus &&
> +             PCI_DEVFN2(bdf) == devfn )
> +        {
> +            if ( rmrr->scope.devices_cnt > 1 )
> +            {
> +                printk(XENLOG_G_ERR VTDPREFIX
> +                       " cannot assign %04x:%02x:%02x.%u"
> +                       " with shared RMRR for Dom%d.\n",
> +                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                       d->domain_id);
> +                return -EPERM;
> +            }
> +        }

Two if()-s like these should be folded into one.

In your place I'd also consider also printing the RMRR base address
for easier analysis of the issue.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-13 10:17               ` Ian Campbell
@ 2015-07-13 17:08                 ` Ian Jackson
  2015-07-14  1:29                   ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Jackson @ 2015-07-13 17:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Chen, Tiejun, xen-devel, Wei Liu, Stefano Stabellini

Ian Campbell writes ("Re: [Xen-devel] [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters"):
> On Mon, 2015-07-13 at 17:55 +0800, Chen, Tiejun wrote:
> > So I can do this as you're expecting now, but seems our change would 
> > make the code style very inconsistent inside this function.

You're right, it would, but I think that is what is called for.

> I think one could make an argument that the exception described in the
> first section of tools/libxl/CODING_STYLE applies here for the
> whitespace issues, but not for the long lines I think.

The wording of the exception is that:

  If it is not feasible to conform fully to the style while patching old
  code, without doing substantial style reengineering first, we may
  accept patches which contain nonconformant elements, provided that
  they don't make the coding style problem worse overall.

  In this case, the new code should conform to the prevailing style in
  the area being touched.

In this case it is indeed feasible to conform fully to the new
whitespace style for these added lines.  It leaves the code in this
function in a mixture of styles, but that is not "infeasible".  It is
merely undesriable, but so is adding more code in the wrong style.

The sentence about new code conforming to the prevailing style applies
only "in this case", ie, only if "it is not feasible ... to conform to
the new style".

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters
  2015-07-13 17:08                 ` Ian Jackson
@ 2015-07-14  1:29                   ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  1:29 UTC (permalink / raw)
  To: Ian Jackson, Ian Campbell; +Cc: xen-devel, Wei Liu, Stefano Stabellini

On 2015/7/14 1:08, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters"):
>> On Mon, 2015-07-13 at 17:55 +0800, Chen, Tiejun wrote:
>>> So I can do this as you're expecting now, but seems our change would
>>> make the code style very inconsistent inside this function.
>
> You're right, it would, but I think that is what is called for.
>
>> I think one could make an argument that the exception described in the
>> first section of tools/libxl/CODING_STYLE applies here for the
>> whitespace issues, but not for the long lines I think.
>
> The wording of the exception is that:
>
>    If it is not feasible to conform fully to the style while patching old
>    code, without doing substantial style reengineering first, we may
>    accept patches which contain nonconformant elements, provided that
>    they don't make the coding style problem worse overall.
>
>    In this case, the new code should conform to the prevailing style in
>    the area being touched.
>
> In this case it is indeed feasible to conform fully to the new
> whitespace style for these added lines.  It leaves the code in this
> function in a mixture of styles, but that is not "infeasible".  It is

Okay. I'll follow the new code style.

Thanks
Tiejun

> merely undesriable, but so is adding more code in the wrong style.
>
> The sentence about new code conforming to the prevailing style applies
> only "in this case", ie, only if "it is not feasible ... to conform to
> the new style".
>
> Ian.
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-13 13:41   ` Jan Beulich
@ 2015-07-14  1:42     ` Chen, Tiejun
  2015-07-14  9:19       ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  1:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Zhang, Kevin Tian, xen-devel

>> +            {
>> +                printk(XENLOG_G_ERR VTDPREFIX
>> +                       " cannot assign %04x:%02x:%02x.%u"
>> +                       " with shared RMRR for Dom%d.\n",
>> +                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>> +                       d->domain_id);
>> +                return -EPERM;
>> +            }
>> +        }
>
> Two if()-s like these should be folded into one.
>
> In your place I'd also consider also printing the RMRR base address
> for easier analysis of the issue.
>

I agree but I think the whole range info should be better,

" with shared RMRR [%"PRIx64",%"PRIx64"] for Dom%d.\n",

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-13 13:35   ` Jan Beulich
@ 2015-07-14  5:22     ` Chen, Tiejun
  2015-07-14  9:32       ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  5:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> Now we can use that memory map to build our final
>> e820 table but it may need to reorder all e820
>> entries.
>
> "it" being what? I'm afraid I can't really make sense of the second
> half of the sentence...

I hope the following can work for you,

...
but finally we should sort them into an increasing order since
we shouldn't assume the original order is always good.

>
>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
>>                        unsigned int lowmem_reserved_base,
>>                        unsigned int bios_image_base)

[snip]

>
>> +     */
>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>> +    {
>> +        uint64_t end = e820[i].addr + e820[i].size;
>
> Either loop index/boundary or used array are wrong here: In the
> earlier loop you copied memory_map[0...nr_map-1] to
> e820[n...n+nr_map-1], but here you're looping looking at
> e820[0...nr_map-1]

You're right. I should lookup all e820[] like this,

for ( i = 0; i < nr; i++ )

>
>> +        if ( e820[i].type == E820_RAM &&
>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>
> Assuming you mean to look at the RDM e820[] entries here, this
> is not a correct check: You don't care about partly or fully
> contained, all you care about is whether low_mem_end extends
> beyond the start of the region.

Here I'm looking at the e820 entry indicating low memory. Because

low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;

and when we allocate MMIO in pci.c, its possible to populate RAM so 
hvm_info->low_mem_pgend would be changed over there. So we need to 
compensate this loss with high memory. Here memory_map[] also records 
the original low/high memory, so if low_mem_end is less-than the 
original we need this compensation.

So here we have two steps to address this issue,

#1. Calculate the loss

>
>> +        {
>> +            add_high_mem = end - low_mem_end;
>> +            e820[i].size = low_mem_end - e820[i].addr;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * And then we also need to adjust highmem.
>> +     */
>
> A single line comment should use the respective comment style.
>

#2. Compensate the loss

>> +    if ( add_high_mem )
>> +    {
>> +        for ( i = 0; i < memory_map.nr_map; i++ )

s/memory_map.nr_map/nr

>> +        {
>> +            if ( e820[i].type == E820_RAM &&
>> +                 e820[i].addr > (1ull << 32))
>> +                e820[i].size += add_high_mem;
>> +        }
>> +    }
>
> But looking at the code I think the comment should be extended to
> state that we currently expect there to be exactly one such RAM
> region.
>

I can add this at the beginning of #1 loop,

Its possible to relocate RAM to allocate sufficient MMIO previously so
low_mem_pgend would be changed over there. And here memory_map[] records 
the original low/high memory, so if low_mem_end is less than the 
original we need to revise low/high memory range in e820.


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-13 10:15           ` Ian Campbell
@ 2015-07-14  5:44             ` Chen, Tiejun
  2015-07-14  7:42               ` Ian Campbell
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  5:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini


On 2015/7/13 18:15, Ian Campbell wrote:
> On Mon, 2015-07-13 at 17:47 +0800, Chen, Tiejun wrote:
>>> This approach looks like it should work, and I think given the point in
>>> the release it would be acceptable for 4.6.
>>>
>>> However long term I think it might make sense to try and reuse one of
>>> the existing libxl__arch hooks, i.e.
>>> libxl__arch_domain_init_hw_description or
>>> libxl__arch_domain_finalise_hw_description. On ARM these are to do with
>>> setting the Device Tree Blob, which included the memory map, so it is
>>> somewhat morally equivalent to configuring the e820 on x86, I think.
>>>
>>> Those hooks are only called from libxl__build_pv today, but calling them
>>> from libxl__build_hvm seems like it would be good too.
>>
>> But seems this is raising some potential risks, isn't this? Although
>> libxl__arch_domain_init_hw_description() and
>> libxl__arch_domain_finalise_hw_description() are NOP to x86, they're
>> really working on ARM side. So if we call them inside
>> libxl__build_hvm(), any affects to ARM? I'm not very sure at this point
>> unless anyone can validate this change on ARM, or you really ensure my
>> concerns is unnecessary.
>
> All ARM guests use the PV code path so there is no risk.

Okay but please take a close look at this,

libxl__build_pv(gc, domid, info, state)
     |
     + libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
                                       libxl_domain_build_info *info,
                                       struct xc_dom_image *dom)

But in our case we need this parameter, struct xc_hvm_build_args *args, 
so how can we handle this conflict? Its not easy to add this, and it 
doesn't make sense as well in pv case.

Thanks
Tiejun	

>
> If there was some change to ARM to introduce an HVM style guest then it
> would want those hooks called in this place too (and they would need
> fixing as part of implementing such a thing).
>
>
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-13 13:12   ` Jan Beulich
@ 2015-07-14  6:39     ` Chen, Tiejun
  2015-07-14  9:27       ` Jan Beulich
  2015-07-15 13:40     ` George Dunlap
  1 sibling, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  6:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> -    } *resource, mem_resource, high_mem_resource, io_resource;
>> +    } *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource;
>
> Despite having gone through description and the rest of the patch I
> can't seem to be able to guess what "exp_mem" stands for.
> Meaningful variable names are quite helpful though, often avoiding
> the need for comments.

exp_mem_resource() is the expanded mem_resource in the case of 
populating RAM.

Maybe I should use the whole word, expand_mem_resource.

>
>>       /* Create a list of device BARs in descending order of size. */

[snip]

>> @@ -309,29 +339,31 @@ void pci_setup(void)
>>       }
>>
>>       /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>> +    cur_pci_mem_start = pci_mem_start;
>>       while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>> +        relocate_ram_for_pci_memory(cur_pci_mem_start);
>
> Please be consistent which variable to want to use in the loop
> (pci_mem_start vs cur_pci_mem_start).

Overall I just call relocate_ram_for_pci_memory() twice and each I 
always pass cur_pci_mem_start. Any inconsistent place?

>
> Also, this being the first substantial change to the function makes
> clear that you _still_ leave the sizing loop untouched, and instead
> make the allocation logic below more complicated. I said before a

But this may be more reasonable than it used to do. In my point of view 
we always need to first allocate 32bit mmio and then allocate 64bit mmio 
since as you said we don't want to expand high memory if possible.

> number of times that I don't think this helps maintainability of this
> already convoluted code. Among other things this manifests itself
> in your second call to relocate_ram_for_pci_memory() in no way
> playing by the constraints explained a few lines up from here in an
> extensive comment.

Can't all variables/comments express what I intend to do here? Except 
for that exp_mem_resource.
               /* 

              * We have to populate more RAM to further allocate 

              * the remaining 32bars. 

              */ 

             if ( mmio32_unallocated_total ) 

             { 

                 cur_pci_mem_start = pci_mem_start - 
mmio32_unallocated_total;
                 relocate_ram_for_pci_memory(cur_pci_mem_start); 

                 exp_mem_resource.base = cur_pci_mem_start; 

                 exp_mem_resource.max = pci_mem_start; 

             }

>
> Therefore I'll not make any further comments on the rest of the
> patch, but instead outline an allocation model that I think would
> fit our needs: Subject to the constraints mentioned above, set up
> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
> bits], i.e. reasonably small a memory block). Each bit represents a
> page usable for MMIO: First of all you remove the range from
> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
> first pass over all devices, allocating (in the bitmap) space for only
> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
> a best fit (i.e. preferably a range not usable by any bigger BAR)
> from top down. For example, if you have available
>
> [f0000000,f8000000)
> [f9000000,f9001000)
> [fa000000,fa003000)
> [fa010000,fa012000)
>
> and you're looking for a single page slot, you should end up
> picking fa002000.

Why is this [f9000000,f9001000]? Just one page in this slot.

>
> After this pass you should be able to do RAM relocation in a
> single attempt just like we do today (you may still grow the MMIO
> window if you know you need to and can fit some of the 64-bit
> BARs in there, subject to said constraints; this is in an attempt
> to help OSes not comfortable with 64-bit resources).
>
> In a 2nd pass you'd then assign 64-bit resources: If you can fit
> them below 4G (you still have the bitmap left of what you've got
> available), put them there. Allocation strategy could be the same

I think basically, your logic is similar to what I did as I described in 
changelog,

   1>. The first allocation round just to 32bit-bar

   If we can finish allocating all 32bit-bar, we just go to allocate 
64bit-bar
   with all remaining resources including low pci memory.

   If not, we need to calculate how much RAM should be populated to 
allocate the
   remaining 32bit-bars, then populate sufficient RAM as 
exp_mem_resource to go
   to the second allocation round 2>.

   2>. The second allocation round to the remaining 32bit-bar

   We should can finish allocating all 32bit-bar in theory, then go to 
the third
   allocation round 3>.

   3>. The third allocation round to 64bit-bar

   We'll try to first allocate from the remaining low memory resource. 
If that
   isn't enough, we try to expand highmem to allocate for 64bit-bar. 
This process
   should be same as the original.

> as above (biggest first), perhaps allowing for some factoring out
> of logic, but here smallest first probably could work equally well.
> The main thought to decide between the two is whether it is
> better to fit as many (small) or as big (in total) as possible a set
> under 4G. I'd generally expect the former (as many as possible,
> leaving only a few huge ones to go above 4G) to be the better
> approach, but that's more a gut feeling than based on hard data.
>

I think bitmap mechanism is a good idea but honestly, its not easy to 
cover all requirements here. And just like bootmem on Linux side, so its 
a little complicated to implement this entirely. So I prefer not to 
introduce this way in current phase.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-14  5:44             ` Chen, Tiejun
@ 2015-07-14  7:42               ` Ian Campbell
  2015-07-14  8:03                 ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Ian Campbell @ 2015-07-14  7:42 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

On Tue, 2015-07-14 at 13:44 +0800, Chen, Tiejun wrote:
> On 2015/7/13 18:15, Ian Campbell wrote:
> > On Mon, 2015-07-13 at 17:47 +0800, Chen, Tiejun wrote:
> >>> This approach looks like it should work, and I think given the point in
> >>> the release it would be acceptable for 4.6.
> >>>
> >>> However long term I think it might make sense to try and reuse one of
> >>> the existing libxl__arch hooks, i.e.
> >>> libxl__arch_domain_init_hw_description or
> >>> libxl__arch_domain_finalise_hw_description. On ARM these are to do with
> >>> setting the Device Tree Blob, which included the memory map, so it is
> >>> somewhat morally equivalent to configuring the e820 on x86, I think.
> >>>
> >>> Those hooks are only called from libxl__build_pv today, but calling them
> >>> from libxl__build_hvm seems like it would be good too.
> >>
> >> But seems this is raising some potential risks, isn't this? Although
> >> libxl__arch_domain_init_hw_description() and
> >> libxl__arch_domain_finalise_hw_description() are NOP to x86, they're
> >> really working on ARM side. So if we call them inside
> >> libxl__build_hvm(), any affects to ARM? I'm not very sure at this point
> >> unless anyone can validate this change on ARM, or you really ensure my
> >> concerns is unnecessary.
> >
> > All ARM guests use the PV code path so there is no risk.
> 
> Okay but please take a close look at this,
> 
> libxl__build_pv(gc, domid, info, state)
>      |
>      + libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>                                        libxl_domain_build_info *info,
>                                        struct xc_dom_image *dom)
> 
> But in our case we need this parameter, struct xc_hvm_build_args *args, 
> so how can we handle this conflict? Its not easy to add this, and it 
> doesn't make sense as well in pv case.

This is an internal API, you can feel free to modify it as necessary.

Please note that I started this subthread with "However long term I
think it might make sense ...", This was not a request to redo this
patch now.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
  2015-07-14  7:42               ` Ian Campbell
@ 2015-07-14  8:03                 ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14  8:03 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

>>>>> However long term I think it might make sense to try and reuse one of
>>>>> the existing libxl__arch hooks, i.e.
>>>>> libxl__arch_domain_init_hw_description or
>>>>> libxl__arch_domain_finalise_hw_description. On ARM these are to do with
>>>>> setting the Device Tree Blob, which included the memory map, so it is
>>>>> somewhat morally equivalent to configuring the e820 on x86, I think.
>>>>>
>>>>> Those hooks are only called from libxl__build_pv today, but calling them
>>>>> from libxl__build_hvm seems like it would be good too.
>>>>
>>>> But seems this is raising some potential risks, isn't this? Although
>>>> libxl__arch_domain_init_hw_description() and
>>>> libxl__arch_domain_finalise_hw_description() are NOP to x86, they're
>>>> really working on ARM side. So if we call them inside
>>>> libxl__build_hvm(), any affects to ARM? I'm not very sure at this point
>>>> unless anyone can validate this change on ARM, or you really ensure my
>>>> concerns is unnecessary.
>>>
>>> All ARM guests use the PV code path so there is no risk.
>>
>> Okay but please take a close look at this,
>>
>> libxl__build_pv(gc, domid, info, state)
>>       |
>>       + libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>>                                         libxl_domain_build_info *info,
>>                                         struct xc_dom_image *dom)
>>
>> But in our case we need this parameter, struct xc_hvm_build_args *args,
>> so how can we handle this conflict? Its not easy to add this, and it
>> doesn't make sense as well in pv case.
>
> This is an internal API, you can feel free to modify it as necessary.

I mean struct xc_hvm_build_args[] is a parameter specific to hvm so its 
wired to pass this in the case of hv. If we wrapper this again its not 
worth going this way.

>
> Please note that I started this subthread with "However long term I
> think it might make sense ...", This was not a request to redo this
> patch now.
>

Okay lets record this and now just keep moving forward with the original.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-07-14  1:42     ` Chen, Tiejun
@ 2015-07-14  9:19       ` Jan Beulich
  0 siblings, 0 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-14  9:19 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, xen-devel

>>> On 14.07.15 at 03:42, <tiejun.chen@intel.com> wrote:
>> > +            {
>>> +                printk(XENLOG_G_ERR VTDPREFIX
>>> +                       " cannot assign %04x:%02x:%02x.%u"
>>> +                       " with shared RMRR for Dom%d.\n",
>>> +                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>> +                       d->domain_id);
>>> +                return -EPERM;
>>> +            }
>>> +        }
>>
>> Two if()-s like these should be folded into one.
>>
>> In your place I'd also consider also printing the RMRR base address
>> for easier analysis of the issue.
>>
> 
> I agree but I think the whole range info should be better,
> 
> " with shared RMRR [%"PRIx64",%"PRIx64"] for Dom%d.\n",

Perhaps, albeit due to there not being overlapping RMRRs the
base address is sufficient for uniquely identifying the one in
question.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-14  6:39     ` Chen, Tiejun
@ 2015-07-14  9:27       ` Jan Beulich
  2015-07-14 10:54         ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-14  9:27 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 14.07.15 at 08:39, <tiejun.chen@intel.com> wrote:
>> > -    } *resource, mem_resource, high_mem_resource, io_resource;
>>> +    } *resource, mem_resource, high_mem_resource, io_resource, 
> exp_mem_resource;
>>
>> Despite having gone through description and the rest of the patch I
>> can't seem to be able to guess what "exp_mem" stands for.
>> Meaningful variable names are quite helpful though, often avoiding
>> the need for comments.
> 
> exp_mem_resource() is the expanded mem_resource in the case of 
> populating RAM.
> 
> Maybe I should use the whole word, expand_mem_resource.

And what does "expand" here mean then?

>>> @@ -309,29 +339,31 @@ void pci_setup(void)
>>>       }
>>>
>>>       /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>>> +    cur_pci_mem_start = pci_mem_start;
>>>       while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>>> +        relocate_ram_for_pci_memory(cur_pci_mem_start);
>>
>> Please be consistent which variable to want to use in the loop
>> (pci_mem_start vs cur_pci_mem_start).
> 
> Overall I just call relocate_ram_for_pci_memory() twice and each I 
> always pass cur_pci_mem_start. Any inconsistent place?

In the quoted code you use pci_mem_start in the while()
condition and cur_pci_mem_start in that same while()'s body.

>> Also, this being the first substantial change to the function makes
>> clear that you _still_ leave the sizing loop untouched, and instead
>> make the allocation logic below more complicated. I said before a
> 
> But this may be more reasonable than it used to do. In my point of view 
> we always need to first allocate 32bit mmio and then allocate 64bit mmio 
> since as you said we don't want to expand high memory if possible.
> 
>> number of times that I don't think this helps maintainability of this
>> already convoluted code. Among other things this manifests itself
>> in your second call to relocate_ram_for_pci_memory() in no way
>> playing by the constraints explained a few lines up from here in an
>> extensive comment.
> 
> Can't all variables/comments express what I intend to do here? Except 
> for that exp_mem_resource.

I'm not talking about a lack of comments, but about your added
use of the function not being in line with what is being explained
in an earlier (pre-existing) comment.

>> Therefore I'll not make any further comments on the rest of the
>> patch, but instead outline an allocation model that I think would
>> fit our needs: Subject to the constraints mentioned above, set up
>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>> bits], i.e. reasonably small a memory block). Each bit represents a
>> page usable for MMIO: First of all you remove the range from
>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>> first pass over all devices, allocating (in the bitmap) space for only
>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>> from top down. For example, if you have available
>>
>> [f0000000,f8000000)
>> [f9000000,f9001000)
>> [fa000000,fa003000)
>> [fa010000,fa012000)
>>
>> and you're looking for a single page slot, you should end up
>> picking fa002000.
> 
> Why is this [f9000000,f9001000]? Just one page in this slot.

This was just a simple example I gave. Or maybe I don't understand
your question...

>> After this pass you should be able to do RAM relocation in a
>> single attempt just like we do today (you may still grow the MMIO
>> window if you know you need to and can fit some of the 64-bit
>> BARs in there, subject to said constraints; this is in an attempt
>> to help OSes not comfortable with 64-bit resources).
>>
>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>> them below 4G (you still have the bitmap left of what you've got
>> available), put them there. Allocation strategy could be the same
> 
> I think basically, your logic is similar to what I did as I described in 
> changelog,

The goal is the same, but the approaches look quite different to
me. In particular my approach avoids calculating mmio_total up
front, then basing RAM relocation on it, only to find subsequently
that more RAM may need to be relocated.

> I think bitmap mechanism is a good idea but honestly, its not easy to 
> cover all requirements here. And just like bootmem on Linux side, so its 
> a little complicated to implement this entirely. So I prefer not to 
> introduce this way in current phase.

I'm afraid it's going to be hard to convince me of any approaches
further complicating the current mechanism instead of overhauling
it.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-14  5:22     ` Chen, Tiejun
@ 2015-07-14  9:32       ` Jan Beulich
  2015-07-14 10:22         ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-14  9:32 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 14.07.15 at 07:22, <tiejun.chen@intel.com> wrote:
>>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>> +    {
>>> +        uint64_t end = e820[i].addr + e820[i].size;
>>
>> Either loop index/boundary or used array are wrong here: In the
>> earlier loop you copied memory_map[0...nr_map-1] to
>> e820[n...n+nr_map-1], but here you're looping looking at
>> e820[0...nr_map-1]
> 
> You're right. I should lookup all e820[] like this,
> 
> for ( i = 0; i < nr; i++ )

Hmm, I would have thought you only care about either part of
the just glued together map.

>>> +        if ( e820[i].type == E820_RAM &&
>>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>>
>> Assuming you mean to look at the RDM e820[] entries here, this
>> is not a correct check: You don't care about partly or fully
>> contained, all you care about is whether low_mem_end extends
>> beyond the start of the region.
> 
> Here I'm looking at the e820 entry indicating low memory. Because
> 
> low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
> 
> and when we allocate MMIO in pci.c, its possible to populate RAM so 
> hvm_info->low_mem_pgend would be changed over there. So we need to 
> compensate this loss with high memory. Here memory_map[] also records 
> the original low/high memory, so if low_mem_end is less-than the 
> original we need this compensation.

And I'm not disputing your intentions - I'm merely pointing out that
afaics the code above doesn't match these intentions. In particular
(as said) I don't see why you need to check low_mem_end < end.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-14  9:32       ` Jan Beulich
@ 2015-07-14 10:22         ` Chen, Tiejun
  2015-07-14 10:48           ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14 10:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/14 17:32, Jan Beulich wrote:
>>>> On 14.07.15 at 07:22, <tiejun.chen@intel.com> wrote:
>>>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>>> +    {
>>>> +        uint64_t end = e820[i].addr + e820[i].size;
>>>
>>> Either loop index/boundary or used array are wrong here: In the
>>> earlier loop you copied memory_map[0...nr_map-1] to
>>> e820[n...n+nr_map-1], but here you're looping looking at
>>> e820[0...nr_map-1]
>>
>> You're right. I should lookup all e820[] like this,
>>
>> for ( i = 0; i < nr; i++ )
>
> Hmm, I would have thought you only care about either part of
> the just glued together map.
>
>>>> +        if ( e820[i].type == E820_RAM &&
>>>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>>>
>>> Assuming you mean to look at the RDM e820[] entries here, this
>>> is not a correct check: You don't care about partly or fully
>>> contained, all you care about is whether low_mem_end extends
>>> beyond the start of the region.
>>
>> Here I'm looking at the e820 entry indicating low memory. Because
>>
>> low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>>
>> and when we allocate MMIO in pci.c, its possible to populate RAM so
>> hvm_info->low_mem_pgend would be changed over there. So we need to
>> compensate this loss with high memory. Here memory_map[] also records
>> the original low/high memory, so if low_mem_end is less-than the
>> original we need this compensation.
>
> And I'm not disputing your intentions - I'm merely pointing out that
> afaics the code above doesn't match these intentions. In particular
> (as said) I don't see why you need to check low_mem_end < end.
>

Before we probably relocate RAM,

low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT

and the e820 entry specific to low memory,

[e820[X].addr, end]

Here, end = e820[X].addr + e820[X].size;

Now low_mem_end = end.

After that, low_mem_end < end. so if

(low_mem_end > e820[X].addr && low_mem_end < end) is true, this means 
that associated RAM entry is hitting, right? Then we need to revise this 
entry as [e820[X].addr, low_mem_end], and compensate [end - low_mem_end] 
to high memory. Anything I'm still wrong here?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-13  6:47     ` Chen, Tiejun
  2015-07-13  8:57       ` Jan Beulich
@ 2015-07-14 10:46       ` George Dunlap
  2015-07-14 10:53         ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-14 10:46 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On 07/13/2015 07:47 AM, Chen, Tiejun wrote:
>> Thanks for this; a few more comments...
>>
> 
> Thanks for your time.
> 
>>> @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
>>>           seg = machine_sbdf >> 16;
>>>           bus = PCI_BUS(machine_sbdf);
>>>           devfn = PCI_DEVFN2(machine_sbdf);
>>> +        flag = domctl->u.assign_device.flag;
>>> +        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
>>
>> This is not a blocker, but a stylistic comment: I would have inverted
>> the bitmask here, as that's conceptually what you're checking.  I
>> won't make this a blocker for going in.
> 
> What about this?
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 6e23fc6..17a4206 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1579,7 +1579,7 @@ int iommu_do_pci_domctl(
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
>          flag = domctl->u.assign_device.flag;
> -        if ( flag > XEN_DOMCTL_DEV_RDM_RELAXED )
> +        if ( flag & ~XEN_DOMCTL_DEV_RDM_MASK )
>          {
>              ret = -EINVAL;
>              break;
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index bca25c9..07549a4 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -480,6 +480,7 @@ struct xen_domctl_assign_device {
>      } u;
>      /* IN */
>  #define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +#define XEN_DOMCTL_DEV_RDM_MASK         0x1

The way this sort of thing is defined in the rest of domctl.h is like this:

#define _XEN_DOMCTL_CDF_hvm_guest     0
#define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)

So the above should be

#define _XEN_DOMCTL_DEV_RDM_RELAXED 0
#define XEN_DOMCTL_DEV_RDM_RELAXED (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)

And then your check in iommu_do_pci_domctl() would look like

if (flag & ~XEN_DOMCTL_DEV_RDM_RELAXED)

And if we end up adding any extra flags, we just | them into the above
conditional, as is done in, for example, the XEN_DOMCTL_createdomain
case in xen/common/domctl.c:do_domctl().

Thanks,
 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-14 10:22         ` Chen, Tiejun
@ 2015-07-14 10:48           ` Jan Beulich
  0 siblings, 0 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-14 10:48 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 14.07.15 at 12:22, <tiejun.chen@intel.com> wrote:
> On 2015/7/14 17:32, Jan Beulich wrote:
>>>>> On 14.07.15 at 07:22, <tiejun.chen@intel.com> wrote:
>>>>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>>>> +    {
>>>>> +        uint64_t end = e820[i].addr + e820[i].size;
>>>>
>>>> Either loop index/boundary or used array are wrong here: In the
>>>> earlier loop you copied memory_map[0...nr_map-1] to
>>>> e820[n...n+nr_map-1], but here you're looping looking at
>>>> e820[0...nr_map-1]
>>>
>>> You're right. I should lookup all e820[] like this,
>>>
>>> for ( i = 0; i < nr; i++ )
>>
>> Hmm, I would have thought you only care about either part of
>> the just glued together map.
>>
>>>>> +        if ( e820[i].type == E820_RAM &&
>>>>> +             low_mem_end > e820[i].addr && low_mem_end < end )
>>>>
>>>> Assuming you mean to look at the RDM e820[] entries here, this
>>>> is not a correct check: You don't care about partly or fully
>>>> contained, all you care about is whether low_mem_end extends
>>>> beyond the start of the region.
>>>
>>> Here I'm looking at the e820 entry indicating low memory. Because
>>>
>>> low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>>>
>>> and when we allocate MMIO in pci.c, its possible to populate RAM so
>>> hvm_info->low_mem_pgend would be changed over there. So we need to
>>> compensate this loss with high memory. Here memory_map[] also records
>>> the original low/high memory, so if low_mem_end is less-than the
>>> original we need this compensation.
>>
>> And I'm not disputing your intentions - I'm merely pointing out that
>> afaics the code above doesn't match these intentions. In particular
>> (as said) I don't see why you need to check low_mem_end < end.
>>
> 
> Before we probably relocate RAM,
> 
> low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT
> 
> and the e820 entry specific to low memory,
> 
> [e820[X].addr, end]
> 
> Here, end = e820[X].addr + e820[X].size;
> 
> Now low_mem_end = end.
> 
> After that, low_mem_end < end. so if
> 
> (low_mem_end > e820[X].addr && low_mem_end < end) is true, this means 
> that associated RAM entry is hitting, right? Then we need to revise this 
> entry as [e820[X].addr, low_mem_end], and compensate [end - low_mem_end] 
> to high memory. Anything I'm still wrong here?

Ah, I think I see now what I misunderstood.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-14 10:46       ` George Dunlap
@ 2015-07-14 10:53         ` Chen, Tiejun
  2015-07-14 11:30           ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14 10:53 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

> The way this sort of thing is defined in the rest of domctl.h is like this:
>
> #define _XEN_DOMCTL_CDF_hvm_guest     0
> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>
> So the above should be
>
> #define _XEN_DOMCTL_DEV_RDM_RELAXED 0
> #define XEN_DOMCTL_DEV_RDM_RELAXED (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>
> And then your check in iommu_do_pci_domctl() would look like
>
> if (flag & ~XEN_DOMCTL_DEV_RDM_RELAXED)
>
> And if we end up adding any extra flags, we just | them into the above
> conditional, as is done in, for example, the XEN_DOMCTL_createdomain
> case in xen/common/domctl.c:do_domctl().
>

Seems Jan didn't like this way IIRC, so I hope Jan also can have a look 
at this beforehand :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-14  9:27       ` Jan Beulich
@ 2015-07-14 10:54         ` Chen, Tiejun
  2015-07-14 11:50           ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-14 10:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser


Note here I don't address your comments above since I think we should 
achieve an agreement firstly.

>> I think bitmap mechanism is a good idea but honestly, its not easy to
>> cover all requirements here. And just like bootmem on Linux side, so its
>> a little complicated to implement this entirely. So I prefer not to
>> introduce this way in current phase.
>
> I'm afraid it's going to be hard to convince me of any approaches
> further complicating the current mechanism instead of overhauling
> it.

I agree we'd better overhaul this since we already found something 
unreasonable here. But one or two weeks is really not enough to fix this 
with a bitmap framework, and although a bitmap can make mmio allocation 
better, but more complicated if we just want to allocate PCI mmio.

So could we do this next? I just feel if you'd like to pay more time 
help me refine our current solution, its relatively realistic to this 
case :) And then we can go into bitmap in details or work out a better 
solution in sufficient time slot.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-14 10:53         ` Chen, Tiejun
@ 2015-07-14 11:30           ` George Dunlap
  2015-07-14 11:45             ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-14 11:30 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Aravind Gopalakrishnan, Suravee Suthikulpanit,
	Yang Zhang, Stefano Stabellini

On 07/14/2015 11:53 AM, Chen, Tiejun wrote:
>> The way this sort of thing is defined in the rest of domctl.h is like
>> this:
>>
>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>
>> So the above should be
>>
>> #define _XEN_DOMCTL_DEV_RDM_RELAXED 0
>> #define XEN_DOMCTL_DEV_RDM_RELAXED (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>>
>> And then your check in iommu_do_pci_domctl() would look like
>>
>> if (flag & ~XEN_DOMCTL_DEV_RDM_RELAXED)
>>
>> And if we end up adding any extra flags, we just | them into the above
>> conditional, as is done in, for example, the XEN_DOMCTL_createdomain
>> case in xen/common/domctl.c:do_domctl().
>>
> 
> Seems Jan didn't like this way IIRC, so I hope Jan also can have a look
> at this beforehand :)

I think Jan thought that the MASK value you defined wasn't meant to be a
single flag, but for all the flags; i.e., that if we added flags in bits
1 and 2, that MASK would become 0x7 rather than 0x1.  And I agree that
there's not much point to having such a mask defined in the public header.

But what I'm doing above is making explicit what you have already; i.e.,
you just set XEN_DOMCTL_DEV_RDM_RELAXED to '1'; the reader has to sort
of infer that the reason '1' is chosen is that it's setting bit 0.
Doing it the way I suggest makes it more clear that this is meant to be
a bitfield, and '0' has been allocated.

Please correct me if I'm wrong, Jan.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-14 11:30           ` George Dunlap
@ 2015-07-14 11:45             ` Jan Beulich
  2015-07-14 13:25               ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-14 11:45 UTC (permalink / raw)
  To: George Dunlap, Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

>>> On 14.07.15 at 13:30, <george.dunlap@eu.citrix.com> wrote:
> On 07/14/2015 11:53 AM, Chen, Tiejun wrote:
>>> The way this sort of thing is defined in the rest of domctl.h is like
>>> this:
>>>
>>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>>
>>> So the above should be
>>>
>>> #define _XEN_DOMCTL_DEV_RDM_RELAXED 0
>>> #define XEN_DOMCTL_DEV_RDM_RELAXED (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>>>
>>> And then your check in iommu_do_pci_domctl() would look like
>>>
>>> if (flag & ~XEN_DOMCTL_DEV_RDM_RELAXED)
>>>
>>> And if we end up adding any extra flags, we just | them into the above
>>> conditional, as is done in, for example, the XEN_DOMCTL_createdomain
>>> case in xen/common/domctl.c:do_domctl().
>>>
>> 
>> Seems Jan didn't like this way IIRC, so I hope Jan also can have a look
>> at this beforehand :)
> 
> I think Jan thought that the MASK value you defined wasn't meant to be a
> single flag, but for all the flags; i.e., that if we added flags in bits
> 1 and 2, that MASK would become 0x7 rather than 0x1.  And I agree that
> there's not much point to having such a mask defined in the public header.
> 
> But what I'm doing above is making explicit what you have already; i.e.,
> you just set XEN_DOMCTL_DEV_RDM_RELAXED to '1'; the reader has to sort
> of infer that the reason '1' is chosen is that it's setting bit 0.
> Doing it the way I suggest makes it more clear that this is meant to be
> a bitfield, and '0' has been allocated.
> 
> Please correct me if I'm wrong, Jan.

Indeed my primary objection was to what you describe. That said,
I'm not too happy with what you propose now either. Not only do
I view this (bit-pos,bit-mask) tuple as redundant unless one actually
needs both in certain places (which doesn't seem to be the case
here), but the naming also conflicts with the C standard (reserving
identifiers starting with underscore and an upper case letter). Just
like said in various other cases - we've got many examples of such
violations already, but I'd prefer not to make the situation worse.

IOW I'd prefer to go with just

#define XEN_DOMCTL_DEV_RDM_RELAXED 1

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-14 10:54         ` Chen, Tiejun
@ 2015-07-14 11:50           ` Jan Beulich
  2015-07-15  0:55             ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-14 11:50 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 14.07.15 at 12:54, <tiejun.chen@intel.com> wrote:
>>> I think bitmap mechanism is a good idea but honestly, its not easy to
>>> cover all requirements here. And just like bootmem on Linux side, so its
>>> a little complicated to implement this entirely. So I prefer not to
>>> introduce this way in current phase.
>>
>> I'm afraid it's going to be hard to convince me of any approaches
>> further complicating the current mechanism instead of overhauling
>> it.
> 
> I agree we'd better overhaul this since we already found something 
> unreasonable here. But one or two weeks is really not enough to fix this 
> with a bitmap framework, and although a bitmap can make mmio allocation 
> better, but more complicated if we just want to allocate PCI mmio.
> 
> So could we do this next? I just feel if you'd like to pay more time 
> help me refine our current solution, its relatively realistic to this 
> case :) And then we can go into bitmap in details or work out a better 
> solution in sufficient time slot.

Looking at how long it took to get here (wasn't this series originally
even meant to go into 4.5?) and how much time I already spent
reviewing all the previous versions, I don't see a point in wasting
even more time on working out details of an approach that's getting
too complex/ugly already anyway.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-14 11:45             ` Jan Beulich
@ 2015-07-14 13:25               ` George Dunlap
  0 siblings, 0 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-14 13:25 UTC (permalink / raw)
  To: Jan Beulich, Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

On 07/14/2015 12:45 PM, Jan Beulich wrote:
>>>> On 14.07.15 at 13:30, <george.dunlap@eu.citrix.com> wrote:
>> On 07/14/2015 11:53 AM, Chen, Tiejun wrote:
>>>> The way this sort of thing is defined in the rest of domctl.h is like
>>>> this:
>>>>
>>>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>>>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>>>
>>>> So the above should be
>>>>
>>>> #define _XEN_DOMCTL_DEV_RDM_RELAXED 0
>>>> #define XEN_DOMCTL_DEV_RDM_RELAXED (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>>>>
>>>> And then your check in iommu_do_pci_domctl() would look like
>>>>
>>>> if (flag & ~XEN_DOMCTL_DEV_RDM_RELAXED)
>>>>
>>>> And if we end up adding any extra flags, we just | them into the above
>>>> conditional, as is done in, for example, the XEN_DOMCTL_createdomain
>>>> case in xen/common/domctl.c:do_domctl().
>>>>
>>>
>>> Seems Jan didn't like this way IIRC, so I hope Jan also can have a look
>>> at this beforehand :)
>>
>> I think Jan thought that the MASK value you defined wasn't meant to be a
>> single flag, but for all the flags; i.e., that if we added flags in bits
>> 1 and 2, that MASK would become 0x7 rather than 0x1.  And I agree that
>> there's not much point to having such a mask defined in the public header.
>>
>> But what I'm doing above is making explicit what you have already; i.e.,
>> you just set XEN_DOMCTL_DEV_RDM_RELAXED to '1'; the reader has to sort
>> of infer that the reason '1' is chosen is that it's setting bit 0.
>> Doing it the way I suggest makes it more clear that this is meant to be
>> a bitfield, and '0' has been allocated.
>>
>> Please correct me if I'm wrong, Jan.
> 
> Indeed my primary objection was to what you describe. That said,
> I'm not too happy with what you propose now either. Not only do
> I view this (bit-pos,bit-mask) tuple as redundant unless one actually
> needs both in certain places (which doesn't seem to be the case
> here), but the naming also conflicts with the C standard (reserving
> identifiers starting with underscore and an upper case letter). Just
> like said in various other cases - we've got many examples of such
> violations already, but I'd prefer not to make the situation worse.
> 
> IOW I'd prefer to go with just
> 
> #define XEN_DOMCTL_DEV_RDM_RELAXED 1

Very well.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-14 11:50           ` Jan Beulich
@ 2015-07-15  0:55             ` Chen, Tiejun
  2015-07-15  4:27               ` Chen, Tiejun
  2015-07-15  8:32               ` Jan Beulich
  0 siblings, 2 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15  0:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> I agree we'd better overhaul this since we already found something
>> unreasonable here. But one or two weeks is really not enough to fix this
>> with a bitmap framework, and although a bitmap can make mmio allocation
>> better, but more complicated if we just want to allocate PCI mmio.
>>
>> So could we do this next? I just feel if you'd like to pay more time
>> help me refine our current solution, its relatively realistic to this
>> case :) And then we can go into bitmap in details or work out a better
>> solution in sufficient time slot.
>
> Looking at how long it took to get here (wasn't this series originally
> even meant to go into 4.5?) and how much time I already spent

Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express 
that its hard to implement that solution in one or two weeks to walking 
into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6 
right now so I'm making an assumption.

> reviewing all the previous versions, I don't see a point in wasting
> even more time on working out details of an approach that's getting
> too complex/ugly already anyway.

Here I'm trying to seek such a kind of two-steps approach if possible.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  0:55             ` Chen, Tiejun
@ 2015-07-15  4:27               ` Chen, Tiejun
  2015-07-15  8:34                 ` Jan Beulich
  2015-07-15  8:32               ` Jan Beulich
  1 sibling, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15  4:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/15 8:55, Chen, Tiejun wrote:
>>> I agree we'd better overhaul this since we already found
>>> something unreasonable here. But one or two weeks is really not
>>> enough to fix this with a bitmap framework, and although a bitmap
>>> can make mmio allocation better, but more complicated if we just
>>> want to allocate PCI mmio.
>>>
>>> So could we do this next? I just feel if you'd like to pay more
>>> time help me refine our current solution, its relatively
>>> realistic to this case :) And then we can go into bitmap in
>>> details or work out a better solution in sufficient time slot.
>>
>> Looking at how long it took to get here (wasn't this series
>> originally even meant to go into 4.5?) and how much time I already
>> spent
>
> Certainly appreciate your time.
>
> I didn't mean its wasting time at this point. I just want to express
>  that its hard to implement that solution in one or two weeks to
> walking into 4.6 as an exception.
>
> Note I know this feature is still not accepted as an exception to 4.6
>  right now so I'm making an assumption.
>
>> reviewing all the previous versions, I don't see a point in
>> wasting even more time on working out details of an approach that's
>> getting too complex/ugly already anyway.
>
> Here I'm trying to seek such a kind of two-steps approach if
> possible.
>

Furthermore, could we have this solution as follows?

@@ -407,8 +408,29 @@ void pci_setup(void)
          }

          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
          bar_data |= (uint32_t)base;
          bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * We need to skip those reserved regions like RMRR but this
pobably
+         * lead the remaing allocation failed.
+         */
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_end = memory_map.map[j].addr +
memory_map.map[j].size;
+                if ( check_overlap(base, bar_sz,
+                                   memory_map.map[j].addr,
+                                   memory_map.map[j].size) )
+                {
+                    if ( !mmio_conflict )
+                        mmio_conflict = true;
+                    base = (reserved_end + bar_sz - 1) &
~(uint64_t)(bar_sz - 1);
+                    goto reallocate_bar;
+                }
+            }
+        }
          base += bar_sz;

          if ( (base < resource->base) || (base > resource->max) )
@@ -416,6 +438,11 @@ void pci_setup(void)
              printf("pci dev %02x:%x bar %02x size "PRIllx": no space for "
                     "resource!\n", devfn>>3, devfn&7, bar_reg,
                     PRIllx_arg(bar_sz));
+            if ( mmio_conflict )
+            {
+                printf("MMIO conflicts with RDM.\n");
+                BUG();
+            }
              continue;
          }

Here we still check RDMs and skip them if any conflicts exist, but at
last, if 1) no sufficient space to allocate all bars to this VM && 2)
this insufficient situation is triggered by RDM, we stop creating this VM.

This may change slightly current mechanism and also guarantee we don't 
ignore any conflicts. But I admit this is not a good solution because 
that alignment issue still exists but I think it may be accepted if all 
PCI devices specific to this VM can work. And in real world most RDMs 
just fall into somewhere but < 0xF0000000 (current default pci memory 
start address), so its not a common case to see this kind of conflict.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  0:55             ` Chen, Tiejun
  2015-07-15  4:27               ` Chen, Tiejun
@ 2015-07-15  8:32               ` Jan Beulich
  2015-07-15  9:04                 ` Chen, Tiejun
  2015-07-15 12:57                 ` Wei Liu
  1 sibling, 2 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-15  8:32 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 15.07.15 at 02:55, <tiejun.chen@intel.com> wrote:
>> > I agree we'd better overhaul this since we already found something
>>> unreasonable here. But one or two weeks is really not enough to fix this
>>> with a bitmap framework, and although a bitmap can make mmio allocation
>>> better, but more complicated if we just want to allocate PCI mmio.
>>>
>>> So could we do this next? I just feel if you'd like to pay more time
>>> help me refine our current solution, its relatively realistic to this
>>> case :) And then we can go into bitmap in details or work out a better
>>> solution in sufficient time slot.
>>
>> Looking at how long it took to get here (wasn't this series originally
>> even meant to go into 4.5?) and how much time I already spent
> 
> Certainly appreciate your time.
> 
> I didn't mean its wasting time at this point. I just want to express 
> that its hard to implement that solution in one or two weeks to walking 
> into 4.6 as an exception.
> 
> Note I know this feature is still not accepted as an exception to 4.6 
> right now so I'm making an assumption.

After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather
than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  4:27               ` Chen, Tiejun
@ 2015-07-15  8:34                 ` Jan Beulich
  2015-07-15  8:59                   ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-15  8:34 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
> Furthermore, could we have this solution as follows?

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject
to co-maintainers overriding me).

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  8:34                 ` Jan Beulich
@ 2015-07-15  8:59                   ` Chen, Tiejun
  2015-07-15  9:10                     ` Chen, Tiejun
  2015-07-15  9:27                     ` Jan Beulich
  0 siblings, 2 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15  8:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/15 16:34, Jan Beulich wrote:
>>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
>> Furthermore, could we have this solution as follows?
>
> Yet more special casing code you want to add. I said no to this
> model, and unless you can address the issue _without_ adding
> a lot of special casing code, the answer will remain no (subject

What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
              pci_mem_start <<= 1;
      }

+    for ( i = 0; i < memory_map.nr_map ; i++ )
+    {
+        uint64_t reserved_start, reserved_size;
+        reserved_start = memory_map.map[i].addr;
+        reserved_size = memory_map.map[i].size;
+        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+                           reserved_start, reserved_size) )
+        {
+            printf("Reserved device memory conflicts current PCI 
memory.\n");
+            BUG();
+        }
+    }
+
      if ( mmio_total > (pci_mem_end - pci_mem_start) )
      {
          printf("Low MMIO hole not large enough for all devices,"

This is very similar to our current policy to 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 
since actually this is also another rare possibility in real world. Even 
I can do this as well when we handle that conflict with 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.

Note its not necessary to concern high memory since we already handle 
this case in the hv code previously, and its also not affected by those 
relocated memory later since our previous policy can make sure RAM isn't 
overlapping with RDM.

Thanks
Tiejun

> to co-maintainers overriding me).
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  8:32               ` Jan Beulich
@ 2015-07-15  9:04                 ` Chen, Tiejun
  2015-07-15 12:57                 ` Wei Liu
  1 sibling, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15  9:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> Certainly appreciate your time.
>>
>> I didn't mean its wasting time at this point. I just want to express
>> that its hard to implement that solution in one or two weeks to walking
>> into 4.6 as an exception.
>>
>> Note I know this feature is still not accepted as an exception to 4.6
>> right now so I'm making an assumption.
>
> After all this is a bug fix (and would have been allowed into 4.5 had
> it been ready in time), so doesn't necessarily need a freeze
> exception (but of course the bar raises the later it gets). Rather

Yes, this is not a bug fix again into 4.6.

> than rushing in something that's cumbersome to maintain, I'd much
> prefer this to be done properly.
>

Indeed, we'd like to finalize this properly as you said. But apparently 
time is not sufficient to allow this happened. So I just suggest we can 
further seek the best solution in next phase.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  8:59                   ` Chen, Tiejun
@ 2015-07-15  9:10                     ` Chen, Tiejun
  2015-07-15  9:27                     ` Jan Beulich
  1 sibling, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15  9:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

> This is very similar to our current policy to
> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6
> since actually this is also another rare possibility in real world. Even
> I can do this as well when we handle that conflict with
> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.

Sorry, here is one typo, s/#6/#5

Thanks
Tiejun

>
> Note its not necessary to concern high memory since we already handle
> this case in the hv code previously, and its also not affected by those
> relocated memory later since our previous policy can make sure RAM isn't
> overlapping with RDM.
>
> Thanks
> Tiejun
>
>> to co-maintainers overriding me).
>>
>> Jan
>>
>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  8:59                   ` Chen, Tiejun
  2015-07-15  9:10                     ` Chen, Tiejun
@ 2015-07-15  9:27                     ` Jan Beulich
  2015-07-15 10:34                       ` Chen, Tiejun
  2015-07-15 11:05                       ` George Dunlap
  1 sibling, 2 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-15  9:27 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
> On 2015/7/15 16:34, Jan Beulich wrote:
>>>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
>>> Furthermore, could we have this solution as follows?
>>
>> Yet more special casing code you want to add. I said no to this
>> model, and unless you can address the issue _without_ adding
>> a lot of special casing code, the answer will remain no (subject
> 
> What about this?
> 
> @@ -301,6 +301,19 @@ void pci_setup(void)
>               pci_mem_start <<= 1;
>       }
> 
> +    for ( i = 0; i < memory_map.nr_map ; i++ )
> +    {
> +        uint64_t reserved_start, reserved_size;
> +        reserved_start = memory_map.map[i].addr;
> +        reserved_size = memory_map.map[i].size;
> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
> +                           reserved_start, reserved_size) )
> +        {
> +            printf("Reserved device memory conflicts current PCI memory.\n");
> +            BUG();
> +        }
> +    }

So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  9:27                     ` Jan Beulich
@ 2015-07-15 10:34                       ` Chen, Tiejun
  2015-07-15 11:25                         ` Jan Beulich
  2015-07-15 11:05                       ` George Dunlap
  1 sibling, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15 10:34 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Ian Campbell, Wei Liu, Ian Jackson,
	Stefano Stabellini, Keir Fraser
  Cc: xen-devel

>>> Yet more special casing code you want to add. I said no to this
>>> model, and unless you can address the issue _without_ adding
>>> a lot of special casing code, the answer will remain no (subject
>>
>> What about this?
>>
>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>                pci_mem_start <<= 1;
>>        }
>>
>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>> +    {
>> +        uint64_t reserved_start, reserved_size;
>> +        reserved_start = memory_map.map[i].addr;
>> +        reserved_size = memory_map.map[i].size;
>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>> +                           reserved_start, reserved_size) )
>> +        {
>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>> +            BUG();
>> +        }
>> +    }
>
> So what would the cure be if someone ran into this BUG() (other
> than removing the device associated with the conflicting RMRR)?

Maybe I  can move this chunk of codes downward those actual allocation 
to check if RDM conflicts with the final allocation, and then just 
disable those associated devices by writing PCI_COMMAND without BUG() 
like this draft code,

     /* If pci bars conflict with RDM we need to disable this pci device. */
     for ( devfn = 0; devfn < 256; devfn++ )
     {
         bar_sz = pci_readl(devfn, bar_reg);
         bar_data = pci_readl(devfn, bar_reg);
         bar_data_upper = pci_readl(devfn, bar_reg + 4);
         /* Until here we don't conflict high memory. */
         if ( bar_data_upper )
             continue;

         for ( i = 0; i < memory_map.nr_map ; i++ )
         {
             uint64_t reserved_start, reserved_size;
             reserved_start = memory_map.map[i].addr;
             reserved_size = memory_map.map[i].size;
             if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz,
                                reserved_start, reserved_size) )
             {
                 printf("Reserved device memory conflicts with this pci 
bar,"
                        " so just disable this device.\n");
                 /* Now disable this device */
                 cmd = pci_readw(devfn, PCI_COMMAND);
                 pci_writew(devfn, PCI_COMMAND, ~cmd);
             }
         }
     }

If this is still not fine to you, look I have to raise a request to 
co-maintainers since its hard to step next in practice.

Hi all guys, what about your idea?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  9:27                     ` Jan Beulich
  2015-07-15 10:34                       ` Chen, Tiejun
@ 2015-07-15 11:05                       ` George Dunlap
  2015-07-15 11:20                         ` Chen, Tiejun
                                           ` (2 more replies)
  1 sibling, 3 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-15 11:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
>> On 2015/7/15 16:34, Jan Beulich wrote:
>>>>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
>>>> Furthermore, could we have this solution as follows?
>>>
>>> Yet more special casing code you want to add. I said no to this
>>> model, and unless you can address the issue _without_ adding
>>> a lot of special casing code, the answer will remain no (subject
>>
>> What about this?
>>
>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>               pci_mem_start <<= 1;
>>       }
>>
>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>> +    {
>> +        uint64_t reserved_start, reserved_size;
>> +        reserved_start = memory_map.map[i].addr;
>> +        reserved_size = memory_map.map[i].size;
>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>> +                           reserved_start, reserved_size) )
>> +        {
>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>> +            BUG();
>> +        }
>> +    }
>
> So what would the cure be if someone ran into this BUG() (other
> than removing the device associated with the conflicting RMRR)?
> Afaics such a guest would remain permanently unbootable, which
> of course is not an option.

Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?

This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:05                       ` George Dunlap
@ 2015-07-15 11:20                         ` Chen, Tiejun
  2015-07-15 12:43                           ` George Dunlap
  2015-07-15 11:24                         ` Jan Beulich
  2015-07-15 11:27                         ` Jan Beulich
  2 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15 11:20 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/15 19:05, George Dunlap wrote:
> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
>>> On 2015/7/15 16:34, Jan Beulich wrote:
>>>>>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
>>>>> Furthermore, could we have this solution as follows?
>>>>
>>>> Yet more special casing code you want to add. I said no to this
>>>> model, and unless you can address the issue _without_ adding
>>>> a lot of special casing code, the answer will remain no (subject
>>>
>>> What about this?
>>>
>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>                pci_mem_start <<= 1;
>>>        }
>>>
>>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +    {
>>> +        uint64_t reserved_start, reserved_size;
>>> +        reserved_start = memory_map.map[i].addr;
>>> +        reserved_size = memory_map.map[i].size;
>>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>> +                           reserved_start, reserved_size) )
>>> +        {
>>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>>> +            BUG();
>>> +        }
>>> +    }
>>
>> So what would the cure be if someone ran into this BUG() (other
>> than removing the device associated with the conflicting RMRR)?
>> Afaics such a guest would remain permanently unbootable, which
>> of course is not an option.
>
> Is not booting worse than what we have now -- which is, booting
> successfully but (probably) having issues due to MMIO ranges
> overlapping RMRRs?

Its really so rare possibility here since in the real world we didn't 
see any RMRR regions >= 0xF0000000 (the default pci memory start.) And I 
already sent out a little better revision in that ensuing email so also 
please take a review if possible :)

>
> This patch series as a whole represents a lot of work and a lot of
> tangible improvements to the situation; and (unless the situation has
> changed) it's almost entirely acked apart from the MMIO placement
> part.  If there is a simple way that we can change hvmloader so that
> most (or even many) VM/device combinations work properly for the 4.6
> release, then I think it's worth considering.
>

Current MMIO allocation mechanism is not good. So we really need to 
reshape that, but we'd better do this with some further discussion in 
next release :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:05                       ` George Dunlap
  2015-07-15 11:20                         ` Chen, Tiejun
@ 2015-07-15 11:24                         ` Jan Beulich
  2015-07-15 11:38                           ` George Dunlap
  2015-07-15 11:27                         ` Jan Beulich
  2 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-15 11:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

>>> On 15.07.15 at 13:05, <George.Dunlap@eu.citrix.com> wrote:
> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
>>> What about this?
>>>
>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>               pci_mem_start <<= 1;
>>>       }
>>>
>>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +    {
>>> +        uint64_t reserved_start, reserved_size;
>>> +        reserved_start = memory_map.map[i].addr;
>>> +        reserved_size = memory_map.map[i].size;
>>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>> +                           reserved_start, reserved_size) )
>>> +        {
>>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>>> +            BUG();
>>> +        }
>>> +    }
>>
>> So what would the cure be if someone ran into this BUG() (other
>> than removing the device associated with the conflicting RMRR)?
>> Afaics such a guest would remain permanently unbootable, which
>> of course is not an option.
> 
> Is not booting worse than what we have now -- which is, booting
> successfully but (probably) having issues due to MMIO ranges
> overlapping RMRRs?

Again a matter of perspective: For devices (USB!) where the RMRR
exists solely for boot time (or outdated OS) use, this would be a
plain regression. For the graphics device Tiejun needs this for, it of
course would make little difference, I agree.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 10:34                       ` Chen, Tiejun
@ 2015-07-15 11:25                         ` Jan Beulich
  2015-07-15 11:34                           ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-15 11:25 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> On 15.07.15 at 12:34, <tiejun.chen@intel.com> wrote:
>>>> Yet more special casing code you want to add. I said no to this
>>>> model, and unless you can address the issue _without_ adding
>>>> a lot of special casing code, the answer will remain no (subject
>>>
>>> What about this?
>>>
>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>                pci_mem_start <<= 1;
>>>        }
>>>
>>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +    {
>>> +        uint64_t reserved_start, reserved_size;
>>> +        reserved_start = memory_map.map[i].addr;
>>> +        reserved_size = memory_map.map[i].size;
>>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>> +                           reserved_start, reserved_size) )
>>> +        {
>>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>>> +            BUG();
>>> +        }
>>> +    }
>>
>> So what would the cure be if someone ran into this BUG() (other
>> than removing the device associated with the conflicting RMRR)?
> 
> Maybe I  can move this chunk of codes downward those actual allocation 
> to check if RDM conflicts with the final allocation, and then just 
> disable those associated devices by writing PCI_COMMAND without BUG() 
> like this draft code,

And what would keep the guest from re-enabling the device?

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:05                       ` George Dunlap
  2015-07-15 11:20                         ` Chen, Tiejun
  2015-07-15 11:24                         ` Jan Beulich
@ 2015-07-15 11:27                         ` Jan Beulich
  2015-07-15 11:40                           ` Chen, Tiejun
  2 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-15 11:27 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

>>> On 15.07.15 at 13:05, <George.Dunlap@eu.citrix.com> wrote:
> This patch series as a whole represents a lot of work and a lot of
> tangible improvements to the situation; and (unless the situation has
> changed) it's almost entirely acked apart from the MMIO placement
> part.  If there is a simple way that we can change hvmloader so that
> most (or even many) VM/device combinations work properly for the 4.6
> release, then I think it's worth considering.

And I think the simplest way is to replace the allocation mechanism
as outlined rather than trying to make the current model cope with
the new requirement.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:25                         ` Jan Beulich
@ 2015-07-15 11:34                           ` Chen, Tiejun
  2015-07-15 13:56                             ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15 11:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>> Maybe I  can move this chunk of codes downward those actual allocation
>> to check if RDM conflicts with the final allocation, and then just
>> disable those associated devices by writing PCI_COMMAND without BUG()
>> like this draft code,
>
> And what would keep the guest from re-enabling the device?
>

We can't but IMO,

#1. We're already posting that warning messages.

#2. Actually this is like this sort of case like, as you known, even in 
the native platform, some broken devices are also disabled by BIOS, 
right? So I think this is OS's responsibility or risk to force enabling 
such a broken device.

#3. Its really rare possibility in real world.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:24                         ` Jan Beulich
@ 2015-07-15 11:38                           ` George Dunlap
  0 siblings, 0 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-15 11:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On 07/15/2015 12:24 PM, Jan Beulich wrote:
>>>> On 15.07.15 at 13:05, <George.Dunlap@eu.citrix.com> wrote:
>> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
>>>> What about this?
>>>>
>>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>>               pci_mem_start <<= 1;
>>>>       }
>>>>
>>>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>>>> +    {
>>>> +        uint64_t reserved_start, reserved_size;
>>>> +        reserved_start = memory_map.map[i].addr;
>>>> +        reserved_size = memory_map.map[i].size;
>>>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>>> +                           reserved_start, reserved_size) )
>>>> +        {
>>>> +            printf("Reserved device memory conflicts current PCI memory.\n");
>>>> +            BUG();
>>>> +        }
>>>> +    }
>>>
>>> So what would the cure be if someone ran into this BUG() (other
>>> than removing the device associated with the conflicting RMRR)?
>>> Afaics such a guest would remain permanently unbootable, which
>>> of course is not an option.
>>
>> Is not booting worse than what we have now -- which is, booting
>> successfully but (probably) having issues due to MMIO ranges
>> overlapping RMRRs?
> 
> Again a matter of perspective: For devices (USB!) where the RMRR
> exists solely for boot time (or outdated OS) use, this would be a
> plain regression. For the graphics device Tiejun needs this for, it of
> course would make little difference, I agree.

So it's the case that, for some devices, not only is it functionally OK
to have RAM in the RMRR with no issues, it's also functionally OK to
have PCI BARs in the RMRR with no issues?

If so, then yes, I agree having a device fail to work with no ability to
work around it is an unacceptable regression.

If we're not targeting 4.6 for this, then the situation changes
somewhat.  One thing worth considering (which perhaps may be what tiejun
is looking at) is the cost of keeping a large number of
working-and-already-acked patches out of tree -- both the psychological
cost, the cost of rebasing, and the cost of re-reviewing rebases.  Given
how independent the hvmloader changes are to the rest of the series,
it's probably worth trying to see if we can check in the other patches
as soon as we branch.

If we can check in those patches with hvmloader still ignoring RMRRs,
then there's no problem.  But if we need the issue addressed *somehow*
in hvmloader before checking the rest of the series in, then I think it
makes sense to consider a minimal change that would make the series
somewhat functional, before moving on to overhauling the hvmloader MMIO
placement code.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:27                         ` Jan Beulich
@ 2015-07-15 11:40                           ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15 11:40 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 2015/7/15 19:27, Jan Beulich wrote:
>>>> On 15.07.15 at 13:05, <George.Dunlap@eu.citrix.com> wrote:
>> This patch series as a whole represents a lot of work and a lot of
>> tangible improvements to the situation; and (unless the situation has
>> changed) it's almost entirely acked apart from the MMIO placement
>> part.  If there is a simple way that we can change hvmloader so that
>> most (or even many) VM/device combinations work properly for the 4.6
>> release, then I think it's worth considering.
>
> And I think the simplest way is to replace the allocation mechanism

This is the best way not the simplest way.

The bitmap way matching our all requirement is not easy and realistic to 
address design/write/test/review in short time. And also the entire 
replacement would bring more potential risks to 4.6 release.

Thanks
Tiejun

> as outlined rather than trying to make the current model cope with
> the new requirement.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:20                         ` Chen, Tiejun
@ 2015-07-15 12:43                           ` George Dunlap
  2015-07-15 13:23                             ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-15 12:43 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

On 07/15/2015 12:20 PM, Chen, Tiejun wrote:
> On 2015/7/15 19:05, George Dunlap wrote:
>> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 15.07.15 at 10:59, <tiejun.chen@intel.com> wrote:
>>>> On 2015/7/15 16:34, Jan Beulich wrote:
>>>>>>>> On 15.07.15 at 06:27, <tiejun.chen@intel.com> wrote:
>>>>>> Furthermore, could we have this solution as follows?
>>>>>
>>>>> Yet more special casing code you want to add. I said no to this
>>>>> model, and unless you can address the issue _without_ adding
>>>>> a lot of special casing code, the answer will remain no (subject
>>>>
>>>> What about this?
>>>>
>>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>>                pci_mem_start <<= 1;
>>>>        }
>>>>
>>>> +    for ( i = 0; i < memory_map.nr_map ; i++ )
>>>> +    {
>>>> +        uint64_t reserved_start, reserved_size;
>>>> +        reserved_start = memory_map.map[i].addr;
>>>> +        reserved_size = memory_map.map[i].size;
>>>> +        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>>> +                           reserved_start, reserved_size) )
>>>> +        {
>>>> +            printf("Reserved device memory conflicts current PCI
>>>> memory.\n");
>>>> +            BUG();
>>>> +        }
>>>> +    }
>>>
>>> So what would the cure be if someone ran into this BUG() (other
>>> than removing the device associated with the conflicting RMRR)?
>>> Afaics such a guest would remain permanently unbootable, which
>>> of course is not an option.
>>
>> Is not booting worse than what we have now -- which is, booting
>> successfully but (probably) having issues due to MMIO ranges
>> overlapping RMRRs?
> 
> Its really so rare possibility here since in the real world we didn't
> see any RMRR regions >= 0xF0000000 (the default pci memory start.) And I
> already sent out a little better revision in that ensuing email so also
> please take a review if possible :)

Do remember the context we're talking about. :-)  Jan said, *if* there
was a device that had an RMRR conflict with the "default" MMIO
placement, then the guest simply wouldn't boot.  I was saying, in that
case, we move from "silently ignore the conflict, possibly making the
device not work" to "guest refuses to boot".  Which, if it was
guaranteed that a conflict would cause the device to no longer work,
would be an improvement.

>> This patch series as a whole represents a lot of work and a lot of
>> tangible improvements to the situation; and (unless the situation has
>> changed) it's almost entirely acked apart from the MMIO placement
>> part.  If there is a simple way that we can change hvmloader so that
>> most (or even many) VM/device combinations work properly for the 4.6
>> release, then I think it's worth considering.
>>
> 
> Current MMIO allocation mechanism is not good. So we really need to
> reshape that, but we'd better do this with some further discussion in
> next release :)

Absolutely; I was saying, if we can put in a "good enough" measure for
this release, then we can get the rest of the patch series in with our
"good enough" hvmloader fix, and then work on fixing it properly next
release.

But if you're not aiming for this release anymore, then our aims are
something different.  (See my other e-mail.)

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15  8:32               ` Jan Beulich
  2015-07-15  9:04                 ` Chen, Tiejun
@ 2015-07-15 12:57                 ` Wei Liu
  1 sibling, 0 replies; 119+ messages in thread
From: Wei Liu @ 2015-07-15 12:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On Wed, Jul 15, 2015 at 09:32:34AM +0100, Jan Beulich wrote:
> >>> On 15.07.15 at 02:55, <tiejun.chen@intel.com> wrote:
> >> > I agree we'd better overhaul this since we already found something
> >>> unreasonable here. But one or two weeks is really not enough to fix this
> >>> with a bitmap framework, and although a bitmap can make mmio allocation
> >>> better, but more complicated if we just want to allocate PCI mmio.
> >>>
> >>> So could we do this next? I just feel if you'd like to pay more time
> >>> help me refine our current solution, its relatively realistic to this
> >>> case :) And then we can go into bitmap in details or work out a better
> >>> solution in sufficient time slot.
> >>
> >> Looking at how long it took to get here (wasn't this series originally
> >> even meant to go into 4.5?) and how much time I already spent
> > 
> > Certainly appreciate your time.
> > 
> > I didn't mean its wasting time at this point. I just want to express 
> > that its hard to implement that solution in one or two weeks to walking 
> > into 4.6 as an exception.
> > 
> > Note I know this feature is still not accepted as an exception to 4.6 
> > right now so I'm making an assumption.
> 
> After all this is a bug fix (and would have been allowed into 4.5 had
> it been ready in time), so doesn't necessarily need a freeze
> exception (but of course the bar raises the later it gets). Rather
> than rushing in something that's cumbersome to maintain, I'd much
> prefer this to be done properly.
> 

This series is twofold. I consider the tools side change RDM (not
limited to RMRR) a new feature.  It introduces a new feature to fix a
bug. It would still be subject to freeze exception from my point of
view.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 12:43                           ` George Dunlap
@ 2015-07-15 13:23                             ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-15 13:23 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Keir Fraser

>>> Is not booting worse than what we have now -- which is, booting
>>> successfully but (probably) having issues due to MMIO ranges
>>> overlapping RMRRs?
>>
>> Its really so rare possibility here since in the real world we didn't
>> see any RMRR regions >= 0xF0000000 (the default pci memory start.) And I
>> already sent out a little better revision in that ensuing email so also
>> please take a review if possible :)
>
> Do remember the context we're talking about. :-)  Jan said, *if* there
> was a device that had an RMRR conflict with the "default" MMIO
> placement, then the guest simply wouldn't boot.  I was saying, in that

I understand what you guys mean. Yes, "BUG" is not good in our case :)

> case, we move from "silently ignore the conflict, possibly making the
> device not work" to "guest refuses to boot".  Which, if it was
> guaranteed that a conflict would cause the device to no longer work,
> would be an improvement.

This is really what I did this with "a little better revision" as I 
mentioned above. Maybe you're missing this, so I'd like to paste that 
below ( but if you already saw this please ignore this below )

     /* If pci bars conflict with RDM we need to disable this pci device. */
     for ( devfn = 0; devfn < 256; devfn++ )
     {
         bar_sz = pci_readl(devfn, bar_reg);
         bar_data = pci_readl(devfn, bar_reg);
         bar_data_upper = pci_readl(devfn, bar_reg + 4);
         /* Until here we don't conflict high memory. */
         if ( bar_data_upper )
             continue;

         for ( i = 0; i < memory_map.nr_map ; i++ )
         {
             uint64_t reserved_start, reserved_size;
             reserved_start = memory_map.map[i].addr;
             reserved_size = memory_map.map[i].size;
             if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz,
                                reserved_start, reserved_size) )
             {
                 printf("Reserved device memory conflicts with this pci 
bar,"
                        " so just disable this device.\n");
                 /* Now disable this device */
                 cmd = pci_readw(devfn, PCI_COMMAND);
                 pci_writew(devfn, PCI_COMMAND, ~cmd);
             }
         }
     }

Here I don't break that original allocation mechanism. Instead, I just 
check if we really have that conflict case and then disable that 
associated device.

>
>>> This patch series as a whole represents a lot of work and a lot of
>>> tangible improvements to the situation; and (unless the situation has
>>> changed) it's almost entirely acked apart from the MMIO placement
>>> part.  If there is a simple way that we can change hvmloader so that
>>> most (or even many) VM/device combinations work properly for the 4.6
>>> release, then I think it's worth considering.
>>>
>>
>> Current MMIO allocation mechanism is not good. So we really need to
>> reshape that, but we'd better do this with some further discussion in
>> next release :)
>
> Absolutely; I was saying, if we can put in a "good enough" measure for
> this release, then we can get the rest of the patch series in with our
> "good enough" hvmloader fix, and then work on fixing it properly next
> release.
>
> But if you're not aiming for this release anymore, then our aims are
> something different.  (See my other e-mail.)
>

I really still try to strive for 4.6 release if possible :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-13 13:12   ` Jan Beulich
  2015-07-14  6:39     ` Chen, Tiejun
@ 2015-07-15 13:40     ` George Dunlap
  2015-07-15 14:00       ` Jan Beulich
  1 sibling, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-15 13:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
> Therefore I'll not make any further comments on the rest of the
> patch, but instead outline an allocation model that I think would
> fit our needs: Subject to the constraints mentioned above, set up
> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
> bits], i.e. reasonably small a memory block). Each bit represents a
> page usable for MMIO: First of all you remove the range from
> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
> first pass over all devices, allocating (in the bitmap) space for only
> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
> a best fit (i.e. preferably a range not usable by any bigger BAR)
> from top down. For example, if you have available
>
> [f0000000,f8000000)
> [f9000000,f9001000)
> [fa000000,fa003000)
> [fa010000,fa012000)
>
> and you're looking for a single page slot, you should end up
> picking fa002000.
>
> After this pass you should be able to do RAM relocation in a
> single attempt just like we do today (you may still grow the MMIO
> window if you know you need to and can fit some of the 64-bit
> BARs in there, subject to said constraints; this is in an attempt
> to help OSes not comfortable with 64-bit resources).
>
> In a 2nd pass you'd then assign 64-bit resources: If you can fit
> them below 4G (you still have the bitmap left of what you've got
> available), put them there. Allocation strategy could be the same
> as above (biggest first), perhaps allowing for some factoring out
> of logic, but here smallest first probably could work equally well.
> The main thought to decide between the two is whether it is
> better to fit as many (small) or as big (in total) as possible a set
> under 4G. I'd generally expect the former (as many as possible,
> leaving only a few huge ones to go above 4G) to be the better
> approach, but that's more a gut feeling than based on hard data.

I agree that it would be more sensible for hvmloader to make a "plan"
first, and then do the memory reallocation (if it's possible) at one
time, then go through and actually update the device BARs according to
the "plan".

However, I don't really see how having a bitmap really helps in this
case.  I would think having a list of free ranges (perhaps aligned by
powers of two?), sorted small->large, makes the most sense.

So suppose we had the above example, but with the range
[fa000000,fa005000) instead, and we're looking for a 4-page region.
Then our "free list" initially would look like this:

[f9000000,f9001000)
[fa010000,fa012000)
[fa000000,fa005000)
[f0000000,f8000000)

After skipping the first two because they aren't big enough, we'd take
0x4000 from the third one, placing the BAR at [fa000000,fa004000), and
putting the remainder [fa004000,fa005000) back on the free list in
order, thus:

[f9000000,f9001000)
[fa004000,fa005000)
[fa010000,fa012000)
[f0000000,f8000000)

If we got to the end and hadn't found a region large enough, *and* we
could still expand the MMIO hole, we could lower pci_mem_start until
it could fit.

What do you think?

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 11:34                           ` Chen, Tiejun
@ 2015-07-15 13:56                             ` George Dunlap
  2015-07-15 16:14                               ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-15 13:56 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

On Wed, Jul 15, 2015 at 12:34 PM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>>> Maybe I  can move this chunk of codes downward those actual allocation
>>> to check if RDM conflicts with the final allocation, and then just
>>> disable those associated devices by writing PCI_COMMAND without BUG()
>>> like this draft code,
>>
>>
>> And what would keep the guest from re-enabling the device?
>>
>
> We can't but IMO,
>
> #1. We're already posting that warning messages.
>
> #2. Actually this is like this sort of case like, as you known, even in the
> native platform, some broken devices are also disabled by BIOS, right? So I
> think this is OS's responsibility or risk to force enabling such a broken
> device.
>
> #3. Its really rare possibility in real world.

Well the real question is if the guest re-enabling the device would
cause a security problem; I think the answer to that must be "no" (or
at least, "it's not any worse than it was before").

The guest OS has the device disabled, and the RMRRs in the e820 map;
if it re-enables the device over the "BIOS" (which hvmloader sort of
acts as), then it should know what it's doing.

Would it be possible, on a collision, to have one last "stab" at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 13:40     ` George Dunlap
@ 2015-07-15 14:00       ` Jan Beulich
  2015-07-15 15:19         ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-15 14:00 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

>>> On 15.07.15 at 15:40, <dunlapg@umich.edu> wrote:
> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> Therefore I'll not make any further comments on the rest of the
>> patch, but instead outline an allocation model that I think would
>> fit our needs: Subject to the constraints mentioned above, set up
>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>> bits], i.e. reasonably small a memory block). Each bit represents a
>> page usable for MMIO: First of all you remove the range from
>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>> first pass over all devices, allocating (in the bitmap) space for only
>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>> from top down. For example, if you have available
>>
>> [f0000000,f8000000)
>> [f9000000,f9001000)
>> [fa000000,fa003000)
>> [fa010000,fa012000)
>>
>> and you're looking for a single page slot, you should end up
>> picking fa002000.
>>
>> After this pass you should be able to do RAM relocation in a
>> single attempt just like we do today (you may still grow the MMIO
>> window if you know you need to and can fit some of the 64-bit
>> BARs in there, subject to said constraints; this is in an attempt
>> to help OSes not comfortable with 64-bit resources).
>>
>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>> them below 4G (you still have the bitmap left of what you've got
>> available), put them there. Allocation strategy could be the same
>> as above (biggest first), perhaps allowing for some factoring out
>> of logic, but here smallest first probably could work equally well.
>> The main thought to decide between the two is whether it is
>> better to fit as many (small) or as big (in total) as possible a set
>> under 4G. I'd generally expect the former (as many as possible,
>> leaving only a few huge ones to go above 4G) to be the better
>> approach, but that's more a gut feeling than based on hard data.
> 
> I agree that it would be more sensible for hvmloader to make a "plan"
> first, and then do the memory reallocation (if it's possible) at one
> time, then go through and actually update the device BARs according to
> the "plan".
> 
> However, I don't really see how having a bitmap really helps in this
> case.  I would think having a list of free ranges (perhaps aligned by
> powers of two?), sorted small->large, makes the most sense.

I view bitmap vs list as just two different representations, and I
picked the bitmap approach as being more compact storage wise
in case there are many regions to deal with. I'd be fine with a list
approach too, provided lookup times don't become prohibitive.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 14:00       ` Jan Beulich
@ 2015-07-15 15:19         ` George Dunlap
  0 siblings, 0 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-15 15:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Tiejun Chen, Keir Fraser

On Wed, Jul 15, 2015 at 3:00 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 15.07.15 at 15:40, <dunlapg@umich.edu> wrote:
>> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>> Therefore I'll not make any further comments on the rest of the
>>> patch, but instead outline an allocation model that I think would
>>> fit our needs: Subject to the constraints mentioned above, set up
>>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>>> bits], i.e. reasonably small a memory block). Each bit represents a
>>> page usable for MMIO: First of all you remove the range from
>>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>>> first pass over all devices, allocating (in the bitmap) space for only
>>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>>> from top down. For example, if you have available
>>>
>>> [f0000000,f8000000)
>>> [f9000000,f9001000)
>>> [fa000000,fa003000)
>>> [fa010000,fa012000)
>>>
>>> and you're looking for a single page slot, you should end up
>>> picking fa002000.
>>>
>>> After this pass you should be able to do RAM relocation in a
>>> single attempt just like we do today (you may still grow the MMIO
>>> window if you know you need to and can fit some of the 64-bit
>>> BARs in there, subject to said constraints; this is in an attempt
>>> to help OSes not comfortable with 64-bit resources).
>>>
>>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>>> them below 4G (you still have the bitmap left of what you've got
>>> available), put them there. Allocation strategy could be the same
>>> as above (biggest first), perhaps allowing for some factoring out
>>> of logic, but here smallest first probably could work equally well.
>>> The main thought to decide between the two is whether it is
>>> better to fit as many (small) or as big (in total) as possible a set
>>> under 4G. I'd generally expect the former (as many as possible,
>>> leaving only a few huge ones to go above 4G) to be the better
>>> approach, but that's more a gut feeling than based on hard data.
>>
>> I agree that it would be more sensible for hvmloader to make a "plan"
>> first, and then do the memory reallocation (if it's possible) at one
>> time, then go through and actually update the device BARs according to
>> the "plan".
>>
>> However, I don't really see how having a bitmap really helps in this
>> case.  I would think having a list of free ranges (perhaps aligned by
>> powers of two?), sorted small->large, makes the most sense.
>
> I view bitmap vs list as just two different representations, and I
> picked the bitmap approach as being more compact storage wise
> in case there are many regions to deal with. I'd be fine with a list
> approach too, provided lookup times don't become prohibitive.

Sure, you can obviously translate one into the other.  The main reason
I dislike the idea of a bitmap is having to write code to determine
where the next free region is, and how big that region is, rather than
just going down the next on the list and reading range.start and
range.len.

Also, in your suggestion each bit is a page (4k); so assuming a 64-bit
pointer, a 64-bit starting point, and a 64-bit length (juts to make
things simple), a single "range" takes up enough bitmap to reserve
(64+64+64)*4k = 768k.  So if we make the bitmap big enough for 2GiB,
then the break-even point for storage is 2,730 ranges.  It's even
higher if we have an array instead of a linked list.

I'm pretty sure that having such a large number of ranges will be
vanishingly rare;  I'd expect the number of ranges so the "range"
representation will not only be easier to code and read, but will in
the common case (I believe) be far more compact.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-09  5:33 ` [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
  2015-07-13 13:35   ` Jan Beulich
@ 2015-07-15 16:00   ` George Dunlap
  2015-07-16  1:58     ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-15 16:00 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> Now we can use that memory map to build our final
> e820 table but it may need to reorder all e820
> entries.

I think I would say:

--
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.
--

>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v5 ~ v7:
>
> * Nothing is changed.
>
> v4:
>
> * Rename local variable, low_mem_pgend, to low_mem_end.
>
> * Improve some code comments
>
> * Adjust highmem after lowmem is changed.
>
>  tools/firmware/hvmloader/e820.c | 80 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index 3e53c47..aa2569f 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint64_t add_high_mem = 0;
> +    uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
>      e820[nr].type = E820_RESERVED;
>      nr++;
>
> -    /* Low RAM goes here. Reserve space for special pages. */
> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
> -    e820[nr].addr = 0x100000;
> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -    e820[nr].type = E820_RAM;
> -    nr++;
> -
>      /*
>       * Explicitly reserve space for special pages.
>       * This space starts at RESERVED_MEMBASE an extends to cover various
> @@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
>
> -
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Construct E820 table according to recorded memory map.
> +     *
> +     * The memory map created by toolstack may include,
> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. Reserved regions if they exist
> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
>      {
> -        e820[nr].addr = ((uint64_t)1 << 32);
> -        e820[nr].size =
> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -        e820[nr].type = E820_RAM;
> +        e820[nr] = memory_map.map[i];
>          nr++;
>      }
>
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_end < (2u << 20));

Won't this BUG if the guest was actually given less than 2GiB of RAM?

> +
> +    /*
> +     * We may need to adjust real lowmem end since we may
> +     * populate RAM to get enough MMIO previously.
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
> +    {
> +        uint64_t end = e820[i].addr + e820[i].size;
> +        if ( e820[i].type == E820_RAM &&
> +             low_mem_end > e820[i].addr && low_mem_end < end )
> +        {
> +            add_high_mem = end - low_mem_end;
> +            e820[i].size = low_mem_end - e820[i].addr;
> +        }
> +    }
> +
> +    /*
> +     * And then we also need to adjust highmem.
> +     */
> +    if ( add_high_mem )
> +    {
> +        for ( i = 0; i < memory_map.nr_map; i++ )
> +        {
> +            if ( e820[i].type == E820_RAM &&
> +                 e820[i].addr > (1ull << 32))
> +                e820[i].size += add_high_mem;
> +        }
> +    }

What if there was originally no high memory, but resizing the pci hole
meant we had to create a high memory region?

Other than those things, looks good.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 13:56                             ` George Dunlap
@ 2015-07-15 16:14                               ` George Dunlap
  2015-07-16  2:05                                 ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-15 16:14 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> Would it be possible, on a collision, to have one last "stab" at
> allocating the BAR somewhere else, without relocating memory (or
> relocating any other BARs)?  At very least then an administrator could
> work around this kind of thing by setting the mmio_hole larger in the
> domain config.

If it's not possible to have this last-ditch relocation effort, then
yes, I'd be OK with just disabling the device for the time being.

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-15 16:00   ` George Dunlap
@ 2015-07-16  1:58     ` Chen, Tiejun
  2015-07-16  9:41       ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-16  1:58 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

> I think I would say:
>
> --
> Now use the hypervisor-supplied memory map to build our final e820 table:
> * Add regions for BIOS ranges and other special mappings not in the
> hypervisor map
> * Add in the hypervisor regions
> * Adjust the lowmem and highmem regions if we've had to relocate
> memory (adding a highmem region if necessary)
> * Sort all the ranges so that they appear in memory order.
> --

I'll update this and thanks a lot.

>
>>
>> CC: Keir Fraser <keir@xen.org>
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---

[snip]

>> +    /* Low RAM goes here. Reserve space for special pages. */
>> +    BUG_ON(low_mem_end < (2u << 20));
>
> Won't this BUG if the guest was actually given less than 2GiB of RAM?

2u << 20 = 0x200000, so this is 2M, not 2G :)

>
>> +
>> +    /*
>> +     * We may need to adjust real lowmem end since we may
>> +     * populate RAM to get enough MMIO previously.
>> +     */

[snip]

>> +
>> +    /*
>> +     * And then we also need to adjust highmem.
>> +     */
>> +    if ( add_high_mem )
>> +    {
>> +        for ( i = 0; i < memory_map.nr_map; i++ )
>> +        {
>> +            if ( e820[i].type == E820_RAM &&
>> +                 e820[i].addr > (1ull << 32))
>> +                e820[i].size += add_high_mem;
>> +        }
>> +    }
>
> What if there was originally no high memory, but resizing the pci hole
> meant we had to create a high memory region?
>

You're right. We need to consider this case.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-15 16:14                               ` George Dunlap
@ 2015-07-16  2:05                                 ` Chen, Tiejun
  2015-07-16  9:40                                   ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-16  2:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

On 2015/7/16 0:14, George Dunlap wrote:
> On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> Would it be possible, on a collision, to have one last "stab" at
>> allocating the BAR somewhere else, without relocating memory (or
>> relocating any other BARs)?  At very least then an administrator could
>> work around this kind of thing by setting the mmio_hole larger in the
>> domain config.
>
> If it's not possible to have this last-ditch relocation effort, then

Could you take a look at the original patch #06 ?  Although Jan thought 
that is complicated, that is really one version that I can refine in 
current time slot.

> yes, I'd be OK with just disabling the device for the time being.
>

Just let me send out new patch series based this idea. We can continue 
discuss this over there but we also need to further review other 
remaining comments based on a new revision.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-10 14:50 ` [v7][PATCH 00/16] Fix RMRR George Dunlap
  2015-07-10 14:56   ` Jan Beulich
@ 2015-07-16  7:55   ` Jan Beulich
  2015-07-16  8:03     ` Chen, Tiejun
  1 sibling, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-16  7:55 UTC (permalink / raw)
  To: George Dunlap, Tiejun Chen; +Cc: xen-devel

>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>> v7:
> 
> It looks like most of the libxl/libxc patches have been acked.  It
> seems to me that most of the hypervisor patches (1-3, 14-15) are
> either ready to go in or pretty close.

Now that I looked over v8 I have to admit that if I was a tools
maintainer I wouldn't want to see some of the tools patches in
with just an ack, but without any review.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  7:55   ` Jan Beulich
@ 2015-07-16  8:03     ` Chen, Tiejun
  2015-07-16  8:08       ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-16  8:03 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap; +Cc: xen-devel

On 2015/7/16 15:55, Jan Beulich wrote:
>>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>>> v7:
>>
>> It looks like most of the libxl/libxc patches have been acked.  It
>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>> either ready to go in or pretty close.
>
> Now that I looked over v8 I have to admit that if I was a tools
> maintainer I wouldn't want to see some of the tools patches in
> with just an ack, but without any review.

I'm somewhat confused at this point.

Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch. It is a 
record that the acker has at least reviewed the patch and has indicated 
acceptance.

Does this imply this is already reviewed?

Thanks
Tiejun	

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:03     ` Chen, Tiejun
@ 2015-07-16  8:08       ` Jan Beulich
  2015-07-16  8:13         ` Chen, Tiejun
  2015-07-16  8:30         ` Ian Campbell
  0 siblings, 2 replies; 119+ messages in thread
From: Jan Beulich @ 2015-07-16  8:08 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: George Dunlap, xen-devel

>>> On 16.07.15 at 10:03, <tiejun.chen@intel.com> wrote:
> On 2015/7/16 15:55, Jan Beulich wrote:
>>>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
>>> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>>>> v7:
>>>
>>> It looks like most of the libxl/libxc patches have been acked.  It
>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>> either ready to go in or pretty close.
>>
>> Now that I looked over v8 I have to admit that if I was a tools
>> maintainer I wouldn't want to see some of the tools patches in
>> with just an ack, but without any review.
> 
> I'm somewhat confused at this point.
> 
> Acked-by: is often used by the maintainer of the affected code when that
> maintainer neither contributed to nor forwarded the patch. It is a 
> record that the acker has at least reviewed the patch and has indicated 
> acceptance.
> 
> Does this imply this is already reviewed?

No, that would be expressed by Reviewed-by. Acked-by merely
means no objection by the maintainer for the change to go in.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:08       ` Jan Beulich
@ 2015-07-16  8:13         ` Chen, Tiejun
  2015-07-16  8:26           ` Jan Beulich
  2015-07-16  8:30         ` Ian Campbell
  1 sibling, 1 reply; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-16  8:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel

>>>> It looks like most of the libxl/libxc patches have been acked.  It
>>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>>> either ready to go in or pretty close.
>>>
>>> Now that I looked over v8 I have to admit that if I was a tools
>>> maintainer I wouldn't want to see some of the tools patches in
>>> with just an ack, but without any review.
>>
>> I'm somewhat confused at this point.
>>
>> Acked-by: is often used by the maintainer of the affected code when that
>> maintainer neither contributed to nor forwarded the patch. It is a
>> record that the acker has at least reviewed the patch and has indicated
>> acceptance.
>>
>> Does this imply this is already reviewed?
>
> No, that would be expressed by Reviewed-by. Acked-by merely
> means no objection by the maintainer for the change to go in.
>

Sorry I'm trying to dig into this.

If nobody would like to take a look at this, so isn't this the 
associated maintainer's responsibility to review finally? In this case 
isn't Acked-by fine enough?

Or you still want to us add two lines explicitly,

Reviewed-by: A
Acked-by: A


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:13         ` Chen, Tiejun
@ 2015-07-16  8:26           ` Jan Beulich
  2015-07-16  9:27             ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-16  8:26 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: George Dunlap, xen-devel

>>> On 16.07.15 at 10:13, <tiejun.chen@intel.com> wrote:
>>>>> It looks like most of the libxl/libxc patches have been acked.  It
>>>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>>>> either ready to go in or pretty close.
>>>>
>>>> Now that I looked over v8 I have to admit that if I was a tools
>>>> maintainer I wouldn't want to see some of the tools patches in
>>>> with just an ack, but without any review.
>>>
>>> I'm somewhat confused at this point.
>>>
>>> Acked-by: is often used by the maintainer of the affected code when that
>>> maintainer neither contributed to nor forwarded the patch. It is a
>>> record that the acker has at least reviewed the patch and has indicated
>>> acceptance.
>>>
>>> Does this imply this is already reviewed?
>>
>> No, that would be expressed by Reviewed-by. Acked-by merely
>> means no objection by the maintainer for the change to go in.
>>
> 
> Sorry I'm trying to dig into this.
> 
> If nobody would like to take a look at this, so isn't this the 
> associated maintainer's responsibility to review finally? In this case 
> isn't Acked-by fine enough?

Acked-by is good enough for a patch to go in, yes. Note that I
didn't make this a requirement (as I'm not the maintainer), I just
said that if I was the maintainer, I would for at least some of the
tools patches. And no, it is not the maintainer's role to do a
review if no-one else did - that's why (s)he can ack it as an
alternative, implying (s)he trusts the author without further
review.

> Or you still want to us add two lines explicitly,
> 
> Reviewed-by: A
> Acked-by: A

We generally take Reviewed-by as a superset of Acked-by, so two
tags by the same person are not needed. And (as said elsewhere
recently) ack-s by non-maintainers generally don't count much.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:08       ` Jan Beulich
  2015-07-16  8:13         ` Chen, Tiejun
@ 2015-07-16  8:30         ` Ian Campbell
  2015-07-16  8:46           ` Wei Liu
  2015-07-16  9:45           ` Lars Kurth
  1 sibling, 2 replies; 119+ messages in thread
From: Ian Campbell @ 2015-07-16  8:30 UTC (permalink / raw)
  To: Jan Beulich, Ian Jackson, Wei Liu; +Cc: George Dunlap, Tiejun Chen, xen-devel

On Thu, 2015-07-16 at 09:08 +0100, Jan Beulich wrote:
> >>> On 16.07.15 at 10:03, <tiejun.chen@intel.com> wrote:
> > On 2015/7/16 15:55, Jan Beulich wrote:
> >>>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
> >>> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> >>>> v7:
> >>>
> >>> It looks like most of the libxl/libxc patches have been acked.  It
> >>> seems to me that most of the hypervisor patches (1-3, 14-15) are
> >>> either ready to go in or pretty close.
> >>
> >> Now that I looked over v8 I have to admit that if I was a tools
> >> maintainer I wouldn't want to see some of the tools patches in
> >> with just an ack, but without any review.
> > 
> > I'm somewhat confused at this point.
> > 
> > Acked-by: is often used by the maintainer of the affected code when that
> > maintainer neither contributed to nor forwarded the patch. It is a 
> > record that the acker has at least reviewed the patch and has indicated 
> > acceptance.
> > 
> > Does this imply this is already reviewed?
> 
> No, that would be expressed by Reviewed-by. Acked-by merely
> means no objection by the maintainer for the change to go in.

For my part I, perhaps wrongly, use Acked-by for both. If I haven't
actually carefully reviewed the change I will usually say so, e.g. "I
see XXX has reviewed this already, so that's fine by me" or something
similar (which I admit gets lost once it becomes just the tags).

I can't speak for Ian or Wei (now CCd) but Ian at least I think operates
similarly.

Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:30         ` Ian Campbell
@ 2015-07-16  8:46           ` Wei Liu
  2015-07-16  9:45           ` Lars Kurth
  1 sibling, 0 replies; 119+ messages in thread
From: Wei Liu @ 2015-07-16  8:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, George Dunlap, Ian Jackson, xen-devel, Jan Beulich, Tiejun Chen

On Thu, Jul 16, 2015 at 09:30:54AM +0100, Ian Campbell wrote:
> On Thu, 2015-07-16 at 09:08 +0100, Jan Beulich wrote:
> > >>> On 16.07.15 at 10:03, <tiejun.chen@intel.com> wrote:
> > > On 2015/7/16 15:55, Jan Beulich wrote:
> > >>>>> On 10.07.15 at 16:50, <George.Dunlap@eu.citrix.com> wrote:
> > >>> On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> > >>>> v7:
> > >>>
> > >>> It looks like most of the libxl/libxc patches have been acked.  It
> > >>> seems to me that most of the hypervisor patches (1-3, 14-15) are
> > >>> either ready to go in or pretty close.
> > >>
> > >> Now that I looked over v8 I have to admit that if I was a tools
> > >> maintainer I wouldn't want to see some of the tools patches in
> > >> with just an ack, but without any review.
> > > 
> > > I'm somewhat confused at this point.
> > > 
> > > Acked-by: is often used by the maintainer of the affected code when that
> > > maintainer neither contributed to nor forwarded the patch. It is a 
> > > record that the acker has at least reviewed the patch and has indicated 
> > > acceptance.
> > > 
> > > Does this imply this is already reviewed?
> > 
> > No, that would be expressed by Reviewed-by. Acked-by merely
> > means no objection by the maintainer for the change to go in.
> 
> For my part I, perhaps wrongly, use Acked-by for both. If I haven't
> actually carefully reviewed the change I will usually say so, e.g. "I
> see XXX has reviewed this already, so that's fine by me" or something
> similar (which I admit gets lost once it becomes just the tags).
> 
> I can't speak for Ian or Wei (now CCd) but Ian at least I think operates
> similarly.
> 

I do the same. I will explicitly say that if I give my ack without looking
at the code.

I use Reviewed-by when the component is not maintained by me.

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:26           ` Jan Beulich
@ 2015-07-16  9:27             ` George Dunlap
  2015-07-16  9:44               ` Jan Beulich
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-16  9:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tiejun Chen, xen-devel

On Thu, Jul 16, 2015 at 9:26 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 16.07.15 at 10:13, <tiejun.chen@intel.com> wrote:
>>>>>> It looks like most of the libxl/libxc patches have been acked.  It
>>>>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>>>>> either ready to go in or pretty close.
>>>>>
>>>>> Now that I looked over v8 I have to admit that if I was a tools
>>>>> maintainer I wouldn't want to see some of the tools patches in
>>>>> with just an ack, but without any review.
>>>>
>>>> I'm somewhat confused at this point.
>>>>
>>>> Acked-by: is often used by the maintainer of the affected code when that
>>>> maintainer neither contributed to nor forwarded the patch. It is a
>>>> record that the acker has at least reviewed the patch and has indicated
>>>> acceptance.
>>>>
>>>> Does this imply this is already reviewed?
>>>
>>> No, that would be expressed by Reviewed-by. Acked-by merely
>>> means no objection by the maintainer for the change to go in.
>>>
>>
>> Sorry I'm trying to dig into this.
>>
>> If nobody would like to take a look at this, so isn't this the
>> associated maintainer's responsibility to review finally? In this case
>> isn't Acked-by fine enough?
>
> Acked-by is good enough for a patch to go in, yes. Note that I
> didn't make this a requirement (as I'm not the maintainer), I just
> said that if I was the maintainer, I would for at least some of the
> tools patches.

There does seem to be a disconnect between how "Reviewed-by" and
"Acked-by" are used on the tools side vs the hypervisor side.  (We
just stumbled across this in an internal discussion about commit stats
actually.)

But in any case, it's the maintainers' responsibility to determine if
something has had sufficient review, and it's their responsibility not
to give an Ack unless they really mean "As far as I'm concerned, this
is ready to go in."  The fact that there were Acks on the toolstack
side ought to mean that this judgement had already been made.

>
>> Or you still want to us add two lines explicitly,
>>
>> Reviewed-by: A
>> Acked-by: A
>
> We generally take Reviewed-by as a superset of Acked-by, so two
> tags by the same person are not needed. And (as said elsewhere
> recently) ack-s by non-maintainers generally don't count much.

<pedantic> ...unless the person has previously had questions or
objections to the patch, in which case Acked-by means "I have no
further objections." </pedantic>

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-16  2:05                                 ` Chen, Tiejun
@ 2015-07-16  9:40                                   ` George Dunlap
  2015-07-16 10:01                                     ` Chen, Tiejun
  0 siblings, 1 reply; 119+ messages in thread
From: George Dunlap @ 2015-07-16  9:40 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
> Could you take a look at the original patch #06 ?  Although Jan thought that
> is complicated, that is really one version that I can refine in current time
> slot.

When you say "original", which version are you talking about?  You
mean the one at the base of this thread (v7)?

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
  2015-07-16  1:58     ` Chen, Tiejun
@ 2015-07-16  9:41       ` George Dunlap
  0 siblings, 0 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-16  9:41 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Wei Liu

On Thu, Jul 16, 2015 at 2:58 AM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>> I think I would say:
>>
>> --
>> Now use the hypervisor-supplied memory map to build our final e820 table:
>> * Add regions for BIOS ranges and other special mappings not in the
>> hypervisor map
>> * Add in the hypervisor regions
>> * Adjust the lowmem and highmem regions if we've had to relocate
>> memory (adding a highmem region if necessary)
>> * Sort all the ranges so that they appear in memory order.
>> --
>
>
> I'll update this and thanks a lot.
>
>>
>>>
>>> CC: Keir Fraser <keir@xen.org>
>>> CC: Jan Beulich <jbeulich@suse.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> CC: Ian Campbell <ian.campbell@citrix.com>
>>> CC: Wei Liu <wei.liu2@citrix.com>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>> ---
>
>
> [snip]
>
>>> +    /* Low RAM goes here. Reserve space for special pages. */
>>> +    BUG_ON(low_mem_end < (2u << 20));
>>
>>
>> Won't this BUG if the guest was actually given less than 2GiB of RAM?
>
>
> 2u << 20 = 0x200000, so this is 2M, not 2G :)

Oh, right. :-)

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  9:27             ` George Dunlap
@ 2015-07-16  9:44               ` Jan Beulich
  2015-07-16  9:59                 ` George Dunlap
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Beulich @ 2015-07-16  9:44 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tiejun Chen, xen-devel

>>> On 16.07.15 at 11:27, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Jul 16, 2015 at 9:26 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 16.07.15 at 10:13, <tiejun.chen@intel.com> wrote:
>>>>>>> It looks like most of the libxl/libxc patches have been acked.  It
>>>>>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>>>>>> either ready to go in or pretty close.
>>>>>>
>>>>>> Now that I looked over v8 I have to admit that if I was a tools
>>>>>> maintainer I wouldn't want to see some of the tools patches in
>>>>>> with just an ack, but without any review.
>>>>>
>>>>> I'm somewhat confused at this point.
>>>>>
>>>>> Acked-by: is often used by the maintainer of the affected code when that
>>>>> maintainer neither contributed to nor forwarded the patch. It is a
>>>>> record that the acker has at least reviewed the patch and has indicated
>>>>> acceptance.
>>>>>
>>>>> Does this imply this is already reviewed?
>>>>
>>>> No, that would be expressed by Reviewed-by. Acked-by merely
>>>> means no objection by the maintainer for the change to go in.
>>>>
>>>
>>> Sorry I'm trying to dig into this.
>>>
>>> If nobody would like to take a look at this, so isn't this the
>>> associated maintainer's responsibility to review finally? In this case
>>> isn't Acked-by fine enough?
>>
>> Acked-by is good enough for a patch to go in, yes. Note that I
>> didn't make this a requirement (as I'm not the maintainer), I just
>> said that if I was the maintainer, I would for at least some of the
>> tools patches.
> 
> There does seem to be a disconnect between how "Reviewed-by" and
> "Acked-by" are used on the tools side vs the hypervisor side.  (We
> just stumbled across this in an internal discussion about commit stats
> actually.)
> 
> But in any case, it's the maintainers' responsibility to determine if
> something has had sufficient review, and it's their responsibility not
> to give an Ack unless they really mean "As far as I'm concerned, this
> is ready to go in."  The fact that there were Acks on the toolstack
> side ought to mean that this judgement had already been made.

Hmm, that's a different view than I take: To me Reviewed-by implies
Acked-by, but not the other way around. And I view it as the
committer's responsibility to ensure a patch has all necessary acks,
but not the maintainer to give an ack only when reviews were done.

In the end I'm afraid the current model of tags may not be suitable
to express everything we need, or the lack of a formal description
somewhere leaves too much room for mismatching interpretations.

Jan

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  8:30         ` Ian Campbell
  2015-07-16  8:46           ` Wei Liu
@ 2015-07-16  9:45           ` Lars Kurth
  1 sibling, 0 replies; 119+ messages in thread
From: Lars Kurth @ 2015-07-16  9:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, George Dunlap, Ian Jackson, xen-devel, Jan Beulich, Tiejun Chen


> On 16 Jul 2015, at 09:30, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> On Thu, 2015-07-16 at 09:08 +0100, Jan Beulich wrote:
>> 
>>> 
>>> Does this imply this is already reviewed?
>> 
>> No, that would be expressed by Reviewed-by. Acked-by merely
>> means no objection by the maintainer for the change to go in.
> 
> For my part I, perhaps wrongly, use Acked-by for both. If I haven't
> actually carefully reviewed the change I will usually say so, e.g. "I
> see XXX has reviewed this already, so that's fine by me" or something
> similar (which I admit gets lost once it becomes just the tags).
> 
> I can't speak for Ian or Wei (now CCd) but Ian at least I think operates
> similarly.
> 
> Ian.

We shouldn't have an argument about this now. I wrote some scripts yesterday to track Acked-by and Reviewed-by and it is clear that these two tags are not consistently used at the moment. Maybe something to pick up in 2 weeks

Regards
Lars

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 00/16] Fix RMRR
  2015-07-16  9:44               ` Jan Beulich
@ 2015-07-16  9:59                 ` George Dunlap
  0 siblings, 0 replies; 119+ messages in thread
From: George Dunlap @ 2015-07-16  9:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tiejun Chen, xen-devel

On Thu, Jul 16, 2015 at 10:44 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 16.07.15 at 11:27, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Jul 16, 2015 at 9:26 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 16.07.15 at 10:13, <tiejun.chen@intel.com> wrote:
>>>>>>>> It looks like most of the libxl/libxc patches have been acked.  It
>>>>>>>> seems to me that most of the hypervisor patches (1-3, 14-15) are
>>>>>>>> either ready to go in or pretty close.
>>>>>>>
>>>>>>> Now that I looked over v8 I have to admit that if I was a tools
>>>>>>> maintainer I wouldn't want to see some of the tools patches in
>>>>>>> with just an ack, but without any review.
>>>>>>
>>>>>> I'm somewhat confused at this point.
>>>>>>
>>>>>> Acked-by: is often used by the maintainer of the affected code when that
>>>>>> maintainer neither contributed to nor forwarded the patch. It is a
>>>>>> record that the acker has at least reviewed the patch and has indicated
>>>>>> acceptance.
>>>>>>
>>>>>> Does this imply this is already reviewed?
>>>>>
>>>>> No, that would be expressed by Reviewed-by. Acked-by merely
>>>>> means no objection by the maintainer for the change to go in.
>>>>>
>>>>
>>>> Sorry I'm trying to dig into this.
>>>>
>>>> If nobody would like to take a look at this, so isn't this the
>>>> associated maintainer's responsibility to review finally? In this case
>>>> isn't Acked-by fine enough?
>>>
>>> Acked-by is good enough for a patch to go in, yes. Note that I
>>> didn't make this a requirement (as I'm not the maintainer), I just
>>> said that if I was the maintainer, I would for at least some of the
>>> tools patches.
>>
>> There does seem to be a disconnect between how "Reviewed-by" and
>> "Acked-by" are used on the tools side vs the hypervisor side.  (We
>> just stumbled across this in an internal discussion about commit stats
>> actually.)
>>
>> But in any case, it's the maintainers' responsibility to determine if
>> something has had sufficient review, and it's their responsibility not
>> to give an Ack unless they really mean "As far as I'm concerned, this
>> is ready to go in."  The fact that there were Acks on the toolstack
>> side ought to mean that this judgement had already been made.
>
> Hmm, that's a different view than I take: To me Reviewed-by implies
> Acked-by, but not the other way around. And I view it as the
> committer's responsibility to ensure a patch has all necessary acks,
> but not the maintainer to give an ack only when reviews were done.

I'm not debating the meaning of Reviewed-by and Acked-by in general;
I'm responding to your questioning earlier whether the tools side was
really ready to go in.

What I'm saying is, when the maintainer gives an "Acked-by", they
should have already either 1) reviewed the code themselves and
determined it's ready to go in (although perhaps in this case a
Reviewed-by would be preferred), or 2) have looked through the patch
and determined based on their trust of the other reviewers, on their
own previous review of the code, or their confidence in the author of
the patch, that the patch is ready to go in.

So your question shouldn't be, "Should we check this in given that
it's only been Acked"?  It should be, "Why did you Ack it when there
has apparently been no review?"  And the answer turns out to be,
"Because we tend to use Acked-by even when we have reviewed it."

 -George

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  2015-07-16  9:40                                   ` George Dunlap
@ 2015-07-16 10:01                                     ` Chen, Tiejun
  0 siblings, 0 replies; 119+ messages in thread
From: Chen, Tiejun @ 2015-07-16 10:01 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich, Keir Fraser

On 2015/7/16 17:40, George Dunlap wrote:
> On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun <tiejun.chen@intel.com> wrote:
>> Could you take a look at the original patch #06 ?  Although Jan thought that
>> is complicated, that is really one version that I can refine in current time
>> slot.
>
> When you say "original", which version are you talking about?  You
> mean the one at the base of this thread (v7)?
>

Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2015-07-16 10:01 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-09  5:33 [v7][PATCH 00/16] Fix RMRR Tiejun Chen
2015-07-09  5:33 ` [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-07-09  5:33 ` [v7][PATCH 02/16] xen/vtd: create RMRR mapping Tiejun Chen
2015-07-09  5:33 ` [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-07-10 13:26   ` George Dunlap
2015-07-10 15:01     ` Jan Beulich
2015-07-10 15:07       ` George Dunlap
2015-07-13  6:37         ` Chen, Tiejun
2015-07-13  5:57       ` Chen, Tiejun
2015-07-13  6:47     ` Chen, Tiejun
2015-07-13  8:57       ` Jan Beulich
2015-07-14 10:46       ` George Dunlap
2015-07-14 10:53         ` Chen, Tiejun
2015-07-14 11:30           ` George Dunlap
2015-07-14 11:45             ` Jan Beulich
2015-07-14 13:25               ` George Dunlap
2015-07-09  5:33 ` [v7][PATCH 04/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-07-09  5:33 ` [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-07-10 13:49   ` George Dunlap
2015-07-13  7:03     ` Chen, Tiejun
2015-07-09  5:33 ` [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-07-13 13:12   ` Jan Beulich
2015-07-14  6:39     ` Chen, Tiejun
2015-07-14  9:27       ` Jan Beulich
2015-07-14 10:54         ` Chen, Tiejun
2015-07-14 11:50           ` Jan Beulich
2015-07-15  0:55             ` Chen, Tiejun
2015-07-15  4:27               ` Chen, Tiejun
2015-07-15  8:34                 ` Jan Beulich
2015-07-15  8:59                   ` Chen, Tiejun
2015-07-15  9:10                     ` Chen, Tiejun
2015-07-15  9:27                     ` Jan Beulich
2015-07-15 10:34                       ` Chen, Tiejun
2015-07-15 11:25                         ` Jan Beulich
2015-07-15 11:34                           ` Chen, Tiejun
2015-07-15 13:56                             ` George Dunlap
2015-07-15 16:14                               ` George Dunlap
2015-07-16  2:05                                 ` Chen, Tiejun
2015-07-16  9:40                                   ` George Dunlap
2015-07-16 10:01                                     ` Chen, Tiejun
2015-07-15 11:05                       ` George Dunlap
2015-07-15 11:20                         ` Chen, Tiejun
2015-07-15 12:43                           ` George Dunlap
2015-07-15 13:23                             ` Chen, Tiejun
2015-07-15 11:24                         ` Jan Beulich
2015-07-15 11:38                           ` George Dunlap
2015-07-15 11:27                         ` Jan Beulich
2015-07-15 11:40                           ` Chen, Tiejun
2015-07-15  8:32               ` Jan Beulich
2015-07-15  9:04                 ` Chen, Tiejun
2015-07-15 12:57                 ` Wei Liu
2015-07-15 13:40     ` George Dunlap
2015-07-15 14:00       ` Jan Beulich
2015-07-15 15:19         ` George Dunlap
2015-07-09  5:33 ` [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-07-13 13:35   ` Jan Beulich
2015-07-14  5:22     ` Chen, Tiejun
2015-07-14  9:32       ` Jan Beulich
2015-07-14 10:22         ` Chen, Tiejun
2015-07-14 10:48           ` Jan Beulich
2015-07-15 16:00   ` George Dunlap
2015-07-16  1:58     ` Chen, Tiejun
2015-07-16  9:41       ` George Dunlap
2015-07-09  5:33 ` [v7][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-07-09  5:34 ` [v7][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
2015-07-09  5:34 ` [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-07-09  9:20   ` Wei Liu
2015-07-09  9:44     ` Chen, Tiejun
2015-07-09 10:37       ` Ian Jackson
2015-07-09 10:53         ` Chen, Tiejun
2015-07-09 18:02   ` Ian Jackson
2015-07-10  0:46     ` Chen, Tiejun
2015-07-09  5:34 ` [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-07-09  9:11   ` Wei Liu
2015-07-09  9:41     ` Chen, Tiejun
2015-07-09 18:14   ` Ian Jackson
2015-07-10  3:19     ` Chen, Tiejun
2015-07-10 10:14       ` Ian Jackson
2015-07-13  9:19         ` Chen, Tiejun
2015-07-09  5:34 ` [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
2015-07-09 18:14   ` Ian Jackson
2015-07-09  5:34 ` [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest Tiejun Chen
2015-07-09 18:17   ` Ian Jackson
2015-07-10  5:40     ` Chen, Tiejun
2015-07-10  9:18       ` Ian Campbell
2015-07-13  9:47         ` Chen, Tiejun
2015-07-13 10:15           ` Ian Campbell
2015-07-14  5:44             ` Chen, Tiejun
2015-07-14  7:42               ` Ian Campbell
2015-07-14  8:03                 ` Chen, Tiejun
2015-07-10 10:15       ` Ian Jackson
2015-07-09  5:34 ` [v7][PATCH 14/16] xen/vtd: enable USB device assignment Tiejun Chen
2015-07-09  5:34 ` [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
2015-07-13 13:41   ` Jan Beulich
2015-07-14  1:42     ` Chen, Tiejun
2015-07-14  9:19       ` Jan Beulich
2015-07-09  5:34 ` [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters Tiejun Chen
2015-07-09 18:23   ` Ian Jackson
2015-07-10  6:05     ` Chen, Tiejun
2015-07-10 10:23       ` Ian Jackson
2015-07-13  9:31         ` Chen, Tiejun
2015-07-13  9:40           ` Ian Campbell
2015-07-13  9:55             ` Chen, Tiejun
2015-07-13 10:17               ` Ian Campbell
2015-07-13 17:08                 ` Ian Jackson
2015-07-14  1:29                   ` Chen, Tiejun
2015-07-10 14:50 ` [v7][PATCH 00/16] Fix RMRR George Dunlap
2015-07-10 14:56   ` Jan Beulich
2015-07-16  7:55   ` Jan Beulich
2015-07-16  8:03     ` Chen, Tiejun
2015-07-16  8:08       ` Jan Beulich
2015-07-16  8:13         ` Chen, Tiejun
2015-07-16  8:26           ` Jan Beulich
2015-07-16  9:27             ` George Dunlap
2015-07-16  9:44               ` Jan Beulich
2015-07-16  9:59                 ` George Dunlap
2015-07-16  8:30         ` Ian Campbell
2015-07-16  8:46           ` Wei Liu
2015-07-16  9:45           ` Lars Kurth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).