All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4][PATCH 00/19] Fix RMRR
@ 2015-06-23  9:57 Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (18 more replies)
  0 siblings, 19 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (18):
      xen/x86/p2m: introduce set_identity_p2m_entry
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: passes rdm reservation policy
      tools/libxc: check to set args.mmio_size before call xc_hvm_build
      tools/libxl: detect and avoid conflicts with RDM
      tools: introduce a new parameter to set a predefined rdm  boundary
      tools/libxl: extend XENMEM_set_memory_map
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr
      tools: parse to enable new rdm policy parameters

 docs/man/xl.cfg.pod.5                       |  71 ++++++
 docs/man/xl.pod.1                           |   7 +-
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             | 115 +++++++--
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |   2 +
 tools/firmware/hvmloader/pci.c              | 180 ++++++++++++--
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  12 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  42 +++-
 tools/libxc/xc_hvm_build_x86.c              |   2 +
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_create.c                  |  19 +-
 tools/libxl/libxl_dm.c                      | 259 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  37 ++-
 tools/libxl/libxl_pci.c                     |  14 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  83 +++++++
 tools/libxl/libxlu_pci.c                    |  92 +++++++
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  36 ++-
 tools/libxl/xl_cmdtable.c                   |   2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  18 +-
 tools/python/xen/lowlevel/xc/xc.c           |  29 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  43 +++-
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |  11 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  15 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  81 ++++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |  10 +-
 xen/include/public/domctl.h                 |   5 +
 xen/include/public/memory.h                 |  32 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 47 files changed, 1429 insertions(+), 119 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Jan Beulich

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich <jbeulich@suse.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Nothing is changed.

 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 32 ++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a730de5 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 48820ea..44ed23d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..7b25275 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,37 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /* IN/OUT */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..d176e8b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25  9:59   ` Tim Deegan
  2015-07-01 15:43   ` George Dunlap
  2015-06-23  9:57 ` [v4][PATCH 03/19] xen/vtd: create RMRR mapping Tiejun Chen
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, Andrew Cooper

We will create this sort of identity mapping as follows:

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

And we also add a returning value to guest_physmap_remove_page()
then make that as a better helper to clear such a p2m entry.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
    return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

 xen/arch/x86/mm/p2m.c     | 40 ++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-x86/p2m.h | 10 +++++++---
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..7e50db6 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
                          p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                           unsigned long mfn, unsigned int page_order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
     gfn_lock(p2m, gfn, page_order);
-    p2m_remove_page(p2m, gfn, mfn, page_order);
+    rc = p2m_remove_page(p2m, gfn, mfn, page_order);
     gfn_unlock(p2m, gfn, page_order);
+    return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+
+    if ( !paging_mode_translate(p2m->domain) )
+        return 0;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                            p2m_mmio_direct, p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+    {
+        ret = -EBUSY;
+        printk(XENLOG_G_WARNING
+               "Cannot setup identity map d%d:%lx,"
+               " gfn already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+    }
+
+    gfn_unlock(p2m, gfn, 0);
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..538a1cf 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -503,9 +503,9 @@ static inline int guest_physmap_add_page(struct domain *d,
 }
 
 /* Remove a page from a domain's p2m table */
-void guest_physmap_remove_page(struct domain *d,
-                               unsigned long gfn,
-                               unsigned long mfn, unsigned int page_order);
+int guest_physmap_remove_page(struct domain *d,
+                              unsigned long gfn,
+                              unsigned long mfn, unsigned int page_order);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -543,6 +543,10 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23 10:12   ` Jan Beulich
  2015-06-23  9:57 ` [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/drivers/passthrough/vtd/iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..202b2d0 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
             while ( base_pfn < end_pfn )
             {
-                if ( intel_iommu_unmap_page(d, base_pfn) )
+                if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )
                     ret = -ENXIO;
                 base_pfn++;
             }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 03/19] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-30 11:08   ` George Dunlap
  2015-07-01 16:30   ` George Dunlap
  2015-06-23  9:57 ` [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Yang Zhang,
	Stefano Stabellini, Ian Campbell

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c                       |  7 ++++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       | 11 +++++++++-
 xen/drivers/passthrough/pci.c               | 15 +++++++++----
 xen/drivers/passthrough/vtd/iommu.c         | 34 ++++++++++++++++++++++-------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  5 +++++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 7e50db6..a3e07d3 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, unsigned int flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         ret = 0;
     else
     {
-        ret = -EBUSY;
+        if ( flag == XEN_DOMCTL_DEV_RDM_STRICT )
+            ret = -EBUSY;
+        else
+            ret = 0;
         printk(XENLOG_G_WARNING
                "Cannot setup identity map d%d:%lx,"
                " gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..e286f1e 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
+                                         XEN_DOMCTL_DEV_NO_RDM);
 
     if ( rc )
         goto fail;
@@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct domain *d,
         if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
             break;
 
+        if ( domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM )
+        {
+            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
+                   " to dom%u failed (%d) since we don't support RDM.\n",
+                   dt_node_full_name(dev), d->domain_id, ret);
+            break;
+        }
+
         if ( unlikely(d->is_dying) )
         {
             ret = -EINVAL;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..0845bd2 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
+        if ( flag > XEN_DOMCTL_DEV_RDM_STRICT )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 202b2d0..59d5fd7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1807,7 +1807,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1855,7 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            /*
+             * RMRR is always reserved on e820 so either of flag
+             * is fine for hardware domain and here we'd like to
+             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+             */
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
+                                        XEN_DOMCTL_DEV_RDM_RELAXED);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        rmrr_identity_mapping(pdev->domain, 0, rmrr,
+                              XEN_DOMCTL_DEV_RDM_RELAXED);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2106,12 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        /*
+         * RMRR is always reserved on e820 so either of flag
+         * is fine for hardware domain and here we'd like to
+         * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+         */
+        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2254,12 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                /*
+                 * Any RMRR flag is always ignored when remove a device,
+                 * so just pass XEN_DOMCTL_DEV_RDM_RELAXED.
+                 */
+                ret = rmrr_identity_mapping(source, 0, rmrr,
+                                            XEN_DOMCTL_DEV_RDM_RELAXED);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2283,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2312,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 538a1cf..408109f 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, unsigned int flag);
 
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..2f9e40e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -478,6 +478,11 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_NO_RDM           0
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+#define XEN_DOMCTL_DEV_RDM_STRICT       2
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (3 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-07-01 16:32   ` George Dunlap
  2015-06-23  9:57 ` [v4][PATCH 06/19] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d5e5242..f63b01a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4745,7 +4745,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4821,7 +4820,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9e08c9b..fcb8682 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 06/19] hvmloader: get guest memory map into memory_map[]
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (4 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 07/19] hvmloader/pci: skip reserved ranges Tiejun Chen
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c      | 35 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 12 ++++++++++++
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc || !nr_entries )
+    {
+        printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+        BUG();
+    }
+
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED )
+        {
+            if ( check_overlap(alloc_addr, alloc_size,
+                               memory_map.map[i].addr,
+                               memory_map.map[i].size) )
+            {
+                printf("Fail to setup memory map due to conflict");
+                printf(" on dynamic reserved memory range.\n");
+                BUG();
+            }
+        }
+    }
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
     uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,13 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[],
+                           unsigned int *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 07/19] hvmloader/pci: skip reserved ranges
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (5 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 06/19] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 08/19] hvmloader/e820: construct guest e820 table Tiejun Chen
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

 tools/firmware/hvmloader/pci.c | 180 +++++++++++++++++++++++++++++++++++------
 1 file changed, 154 insertions(+), 26 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..5470958 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+    struct xen_add_to_physmap xatp;
+    unsigned int nr_pages = min_t(
+        unsigned int,
+        hvm_info->low_mem_pgend - (cur_pci_mem_start >> PAGE_SHIFT),
+        (1u << 16) - 1);
+    if ( hvm_info->high_mem_pgend == 0 )
+        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
+    hvm_info->low_mem_pgend -= nr_pages;
+    printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
+           " for lowmem MMIO hole\n",
+           nr_pages,
+           PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
+           PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
+    xatp.domid = DOMID_SELF;
+    xatp.space = XENMAPSPACE_gmfn_range;
+    xatp.idx   = hvm_info->low_mem_pgend;
+    xatp.gpfn  = hvm_info->high_mem_pgend;
+    xatp.size  = nr_pages;
+    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+        BUG();
+    hvm_info->high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,7 +75,7 @@ void pci_setup(void)
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
         uint64_t base, max;
-    } *resource, mem_resource, high_mem_resource, io_resource;
+    } *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource;
 
     /* Create a list of device BARs in descending order of size. */
     struct bars {
@@ -59,8 +84,11 @@ void pci_setup(void)
         uint32_t bar_reg;
         uint64_t bar_sz;
     } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
-    uint64_t mmio_hole_size = 0;
+    unsigned int i, j, n, nr_bars = 0;
+    uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+    bool bar32_allocating = 0;
+    uint64_t mmio32_unallocated_total = 0;
+    unsigned long cur_pci_mem_start = 0;
 
     const char *s;
     /*
@@ -309,29 +337,31 @@ void pci_setup(void)
     }
 
     /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+    cur_pci_mem_start = pci_mem_start;
     while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+        relocate_ram_for_pci_memory(cur_pci_mem_start);
+
+    /*
+     * Check if reserved device memory conflicts current pci memory.
+     * If yes, we need to first allocate bar32 since reserved devices
+     * always occupy low memory, and also enable relocating some BARs
+     * to 64bit as possible.
+     */
+    for ( i = 0; i < memory_map.nr_map ; i++ )
     {
-        struct xen_add_to_physmap xatp;
-        unsigned int nr_pages = min_t(
-            unsigned int,
-            hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
-            (1u << 16) - 1);
-        if ( hvm_info->high_mem_pgend == 0 )
-            hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
-        hvm_info->low_mem_pgend -= nr_pages;
-        printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
-               " for lowmem MMIO hole\n",
-               nr_pages,
-               PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
-               PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
-        xatp.domid = DOMID_SELF;
-        xatp.space = XENMAPSPACE_gmfn_range;
-        xatp.idx   = hvm_info->low_mem_pgend;
-        xatp.gpfn  = hvm_info->high_mem_pgend;
-        xatp.size  = nr_pages;
-        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
-            BUG();
-        hvm_info->high_mem_pgend += nr_pages;
+        reserved_start = memory_map.map[i].addr;
+        reserved_size = memory_map.map[i].size;
+        reserved_end = reserved_start + reserved_size;
+        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+                           reserved_start, reserved_size) )
+        {
+            printf("Reserved device memory conflicts current PCI memory,"
+                   " so first to allocate 32-bit BAR and trying to"
+                   " relocating some BARs to 64-bit\n");
+            bar32_allocating = 1;
+            if ( !bar64_relocate )
+                bar64_relocate = 1;
+        }
     }
 
     high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
@@ -352,6 +382,7 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
+ further_allocate:
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -360,6 +391,13 @@ void pci_setup(void)
         bar_sz  = bars[i].bar_sz;
 
         /*
+         * This means we'd like to first allocate 32bit bar to make sure
+         * all 32bit bars can be allocated as possible.
+         */
+        if ( bars[i].is_64bar && bar32_allocating )
+            continue;
+
+        /*
          * Relocate to high memory if the total amount of MMIO needed
          * is more than the low MMIO available.  Because devices are
          * processed in order of bar_sz, this will preferentially
@@ -395,7 +433,14 @@ void pci_setup(void)
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             } 
             else {
-                resource = &mem_resource;
+                /*
+                 * This menas we're trying to use that expanded
+                 * memory to reallocate 32bars.
+                 */
+                if ( mmio32_unallocated_total )
+                    resource = &exp_mem_resource;
+                else
+                    resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
             mmio_total -= bar_sz;
@@ -406,9 +451,44 @@ void pci_setup(void)
             bar_data &= ~PCI_BASE_ADDRESS_IO_MASK;
         }
 
-        base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
+        base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * We should skip all reserved device memory, but we also need
+         * to check if other smaller bars can be allocated if a mmio hole
+         * exists between resource->base and reserved device memory.
+         */
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_start = memory_map.map[i].addr;
+                reserved_size = memory_map.map[i].size;
+                reserved_end = reserved_start + reserved_size;
+                if ( check_overlap(base, bar_sz,
+                                   reserved_start, reserved_size) )
+                {
+                    /*
+                     * If a hole exists between base and reserved device
+                     * memory, lets go out simply to try allocate for next
+                     * bar since all bars are in descending order of size.
+                     */
+                    if ( resource->base < reserved_start )
+                        continue;
+                    /*
+                     * If not, we need to move resource->base to
+                     * reserved_end just to reallocate this bar.
+                     */
+                    else
+                    {
+                        resource->base = reserved_end;
+                        goto reallocate_bar;
+                    }
+                }
+            }
+        }
         base += bar_sz;
 
         if ( (base < resource->base) || (base > resource->max) )
@@ -439,6 +519,54 @@ void pci_setup(void)
         else
             cmd |= PCI_COMMAND_IO;
         pci_writew(devfn, PCI_COMMAND, cmd);
+
+        /* If we finish allocating bar32 at the first time. */
+        if ( i == nr_bars && bar32_allocating )
+        {
+            /*
+             * We won't repeat to populate more RAM to finalize
+             * allocate all 32bars, so just go to allocate 64bit-bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                bar32_allocating = 0;
+                mmio32_unallocated_total = 0;
+                high_mem_resource.base =
+                        ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
+                goto further_allocate;
+            }
+
+            /* Calculate the remaining 32bars. */
+            for ( n = 0; n < nr_bars ; n++ )
+            {
+                if ( !bars[n].is_64bar )
+                {
+                    uint32_t devfn32, bar_reg32, bar_data32;
+                    uint64_t bar_sz32;
+                    devfn32   = bars[n].devfn;
+                    bar_reg32 = bars[n].bar_reg;
+                    bar_sz32  = bars[n].bar_sz;
+                    bar_data32 = pci_readl(devfn32, bar_reg32);
+                    if ( !bar_data32 )
+                        mmio32_unallocated_total  += bar_sz32;
+                }
+            }
+
+            /*
+             * We have to populate more RAM to further allocate
+             * the remaining 32bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                cur_pci_mem_start = pci_mem_start - mmio32_unallocated_total;
+                relocate_ram_for_pci_memory(cur_pci_mem_start);
+                exp_mem_resource.base = cur_pci_mem_start;
+                exp_mem_resource.max = pci_mem_start;
+            }
+            else
+                bar32_allocating = 0;
+            goto further_allocate;
+        }
     }
 
     if ( pci_hi_mem_start )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 08/19] hvmloader/e820: construct guest e820 table
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (6 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 07/19] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Wei Liu

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.

 tools/firmware/hvmloader/e820.c | 80 +++++++++++++++++++++++++++++++++--------
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint64_t add_high_mem = 0;
+    uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct E820 table according to recorded memory map.
+     *
+     * The memory map created by toolstack may include,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. Reserved regions if they exist
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
     {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
         nr++;
     }
 
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_end < (2u << 20));
+
+    /*
+     * We may need to adjust real lowmem end since we may
+     * populate RAM to get enough MMIO previously.
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_end > e820[i].addr && low_mem_end < end )
+        {
+            add_high_mem = end - low_mem_end;
+            e820[i].size = low_mem_end - e820[i].addr;
+        }
+    }
+
+    /*
+     * And then we also need to adjust highmem.
+     */
+    if ( add_high_mem )
+    {
+        for ( i = 0; i < memory_map.nr_map; i++ )
+        {
+            if ( e820[i].type == E820_RAM &&
+                 e820[i].addr > (1ull << 32))
+                e820[i].size += add_high_mem;
+        }
+    }
+
+    /* Finally we need to reorder all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (7 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 08/19] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 10:44   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (8 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 10:54   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, David Scott, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: David Scott <dave.scott@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  6 +++++-
 tools/libxl/libxl_pci.c             |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
 5 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..40ff0f4 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
 
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
     domctl.u.assign_device.u.dt.size = size;
+    /* DT doesn't own any RDM. */
+    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
     set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
     rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..317bf75 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_NO_RDM,
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+    XEN_DOMCTL_DEV_RDM_STRICT,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..172bdf0 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,16 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        *flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +629,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +662,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +695,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (9 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:38   ` Wei Liu
                     ` (3 more replies)
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
                   ` (7 subsequent siblings)
  18 siblings, 4 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM will be killed, while 'relaxed' allows moving forward with a warning
message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5        | 50 ++++++++++++++++++++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 +++++++++++++++++++++
 tools/libxl/libxl_create.c   | 13 ++++++++++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c      |  3 +++
 tools/libxl/libxl_types.idl  | 18 ++++++++++++++++
 6 files changed, 110 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..638b350 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,49 @@ assigned slave device.
 
 =back
 
+=item B<rdm="RDM_RESERVE_STRING">
+
+(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<type="STRING">
+
+Currently we just have two types:
+
+"host" means all reserved device memory on this platform should be reserved
+in this VM's guest address space space. This global RDM parameter allows
+user to specify reserved regions explicitly. And using "host" to include all
+reserved regions reported on this platform which is good to handle hotplug
+scenario. In the future this parameter may be further extended to allow
+specifying random regions, e.g. even those belonging to another platform as
+a preparation for live migration with passthrough devices.
+
+"none" means we have nothing to do all reserved regions and ignore all policies,
+so guest work as before.
+
+=over 4
+
+=item B<reserve="STRING">
+
+Conflict may be detected when reserving reserved device memory in guest address
+space. "strict" means an unsolved conflict leads to immediate VM crash, while
+"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
+is default.
+
+Note this may be overridden by rdm_reserve option in PCI device configuration.
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_reserv="STRING">
+
+(HVM/x86 only) This is same as reserve option above but just specific
+to a given device, and "strict" is default here.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..7d63c47 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..6c8ec63 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -419,6 +425,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                    libxl_domain_type_to_string(b_info->type));
         return ERROR_INVAL;
     }
+
+    libxl__rdm_setdefault(gc, b_info);
     return 0;
 }
 
@@ -1450,6 +1458,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
     }
 
     for (i = 0; i < d_config->num_pcidevs; i++) {
+        /*
+         * If the rdm global policy is 'strict' we should override each device.
+         */
+        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_STRICT)
+            d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
         ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
         if (ret < 0) {
             LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e96d6b5..eae1dc6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1108,6 +1108,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..a00d799 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1040,6 +1040,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if ( pci->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        pci->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 23f27d4..dd91b38 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -73,6 +73,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -366,6 +377,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("type",    libxl_rdm_reserve_type),
+    ("reserve",   libxl_rdm_reserve_flag),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -413,6 +429,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("kernel",           string),
     ("cmdline",          string),
     ("ramdisk",          string),
+    ("rdm",     libxl_rdm_reserve),
     # Given the complexity of verifying the validity of a device tree,
     # libxl doesn't do any security check on it. It's the responsibility
     # of the caller to provide only trusted device tree.
@@ -539,6 +556,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
 libxl_device_dtdev = Struct("device_dtdev", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (10 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:37   ` Wei Liu
                     ` (4 more replies)
  2015-06-23  9:57 ` [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build Tiejun Chen
                   ` (6 subsequent siblings)
  18 siblings, 5 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch passes our rdm reservation policy inside libxl
when we assign a device or attach a device.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Fix one typo, s/unkwon/unknown
* In command description, we should use "[]" to indicate it's optional
  for that extended xl command, pci-attach.

 docs/man/xl.pod.1         |  7 ++++++-
 tools/libxl/libxl_pci.c   | 10 +++++++++-
 tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
 tools/libxl/xl_cmdtable.c |  2 +-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 4eb929d..c5c4809 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
 usable by Domain 0 again.  If the device is not bound to pciback, it will
 return success.
 
-=item B<pci-attach> I<domain-id> I<BDF>
+=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
 
 Hot-plug a new pass-through pci device to the specified domain.
 B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
+B<rdm policy> is about how to handle conflict between reserving reserved device
+memory and guest address space. "strict" means an unsolved conflict leads to
+immediate VM crash, while "relaxed" allows VM moving forward with a warning
+message thrown out. Here "strict" is default.
+
 
 =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
 
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index a00d799..a6a2a8c 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
-    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+    uint32_t flag;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        } else {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..5637c30 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3349,7 +3349,8 @@ int main_pcidetach(int argc, char **argv)
     pcidetach(domid, bdf, force);
     return 0;
 }
-static void pciattach(uint32_t domid, const char *bdf, const char *vs)
+static void pciattach(uint32_t domid, const char *bdf, const char *vs,
+                      uint32_t flag)
 {
     libxl_device_pci pcidev;
     XLU_Config *config;
@@ -3359,6 +3360,7 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
     config = xlu_cfg_init(stderr, "command line");
     if (!config) { perror("xlu_cfg_inig"); exit(-1); }
 
+    pcidev.rdm_reserve = flag;
     if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
         fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", bdf);
         exit(2);
@@ -3371,9 +3373,9 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
 
 int main_pciattach(int argc, char **argv)
 {
-    uint32_t domid;
+    uint32_t domid, flag;
     int opt;
-    const char *bdf = NULL, *vs = NULL;
+    const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
 
     SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
         /* No options */
@@ -3385,7 +3387,20 @@ int main_pciattach(int argc, char **argv)
     if (optind + 1 < argc)
         vs = argv[optind + 2];
 
-    pciattach(domid, bdf, vs);
+    if (optind + 2 < argc) {
+        rdm_policy = argv[optind + 3];
+    }
+    if (!strcmp(rdm_policy, "strict")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
+    } else if (!strcmp(rdm_policy, "relaxed")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+    } else {
+        fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
+                rdm_policy);
+        exit(2);
+    }
+
+    pciattach(domid, bdf, vs, flag);
     return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7f4759b..36a2aaa 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
     { "pci-attach",
       &main_pciattach, 0, 1,
       "Insert a new pass-through pci device",
-      "<Domain> <BDF> [Virtual Slot]",
+      "<Domain> <BDF> [Virtual Slot] [policy to reserve rdm<'strice'|'relaxed'>]",
     },
     { "pci-detach",
       &main_pcidetach, 0, 1,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (11 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:08   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

After commit 5dff8e9eedc7, "libxc/libxl: fill xc_hvm_build_args in
libxl" is introduced, we won't check to set args.mmio_size inside
xc_hvm_build as before. So instead, we need to do this before call
that.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Separate this from currenpt patch #14 since this is specific to xc.

 tools/libxc/xc_hvm_build_x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 003ea06..7343e87 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -754,6 +754,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
     args.mem_size = (uint64_t)memsize << 20;
     args.mem_target = (uint64_t)target << 20;
     args.image_file_name = image_name;
+    if ( args.mmio_size == 0 )
+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
 
     return xc_hvm_build(xch, domid, &args);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (12 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:23   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RAM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Kevin Tian <kevint.tian@intel.com>
---
v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c       | 259 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c      |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 293 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6c8ec63..30e6593 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -460,7 +460,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 33f9ce6..5436bcf 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,265 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static struct xen_reserved_device_memory
+*xc_device_get_rdm(libxl__gc *gc,
+                   uint32_t flag,
+                   uint16_t seg,
+                   uint8_t bus,
+                   uint8_t devfn,
+                   unsigned int *nr_entries)
+{
+    struct xen_reserved_device_memory *xrdm;
+    int rc;
+
+    /*
+     * We really can't presume how many entries we can get in advance.
+     */
+    *nr_entries = 0;
+    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                       NULL, nr_entries);
+    assert(rc <= 0);
+    /* "0" means we have no any rdm entry. */
+    if (!rc)
+        goto out;
+
+    if (errno == ENOBUFS) {
+        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
+        if (!xrdm) {
+            LOG(ERROR, "Could not allocate RDM buffer!\n");
+            goto out;
+        }
+        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                           xrdm, nr_entries);
+        if (rc) {
+            LOG(ERROR, "Could not get reserved device memory maps.\n");
+            *nr_entries = 0;
+            free(xrdm);
+            xrdm = NULL;
+        }
+    } else
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+
+ out:
+    return xrdm;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl. Note when both policies
+ * are specified on a given region, 'strict' is always preferred.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint32_t type = d_config->b_info.rdm.type;
+    uint16_t seg;
+    uint8_t bus, devfn;
+    uint64_t rdm_start, rdm_size;
+    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+
+    /* Might not expose rdm. */
+    if (type == LIBXL_RDM_RESERVE_TYPE_NONE && !d_config->num_pcidevs)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                 0, 0, 0, &nr_entries);
+        if (!nr_entries)
+            return 0;
+
+        assert(xrdm);
+
+        d_config->num_rdms = nr_entries;
+        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+        for (i = 0; i < d_config->num_rdms; i++) {
+            d_config->rdms[i].start =
+                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[i].size =
+                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
+        }
+
+        free(xrdm);
+    } else
+        d_config->num_rdms = 0;
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+        bool new = true;
+
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                 seg, bus, devfn, &nr_entries);
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        assert(xrdm);
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RDMs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RDM entry
+         *
+         * different policies may be configured on the same RDM due to above
+         * two cases. We choose a simple policy to always favor stricter policy
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start ==
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
+             {
+                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
+                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+            d_config->rdms[d_config->num_rdms - 1].start =
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].size =
+                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].flag =
+                                d_config->pcidevs[i].rdm_reserve;
+        }
+        free(xrdm);
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RDM  in highmem area (>4G)
+     *  - RDM lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RDM entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RDM entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_end - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_end = rdm_start;
+        }
+    }
+
+    /* Sync highmem_end. */
+    args->highmem_end = highmem_end;
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_end,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 args->highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 600393d..34bd466 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,13 +914,20 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    /*
+     * Currently we fix this as 2G to guarantte how to handle
+     * our rdm policy. But we'll provide a parameter to set
+     * this dynamically.
+     */
+    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -958,6 +965,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.highmem_end = highmem_end;
     args.mmio_start = mmio_start;
 
+    ret = libxl__domain_device_construct_rdm(gc, d_config,
+                                             rdm_mem_boundary,
+                                             &args);
+    if (ret) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (info->num_vnuma_nodes != 0) {
         int i;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index eae1dc6..c0acf11 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1011,7 +1011,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1519,6 +1519,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index dd91b38..5ba075d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -559,6 +559,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("flag", libxl_rdm_reserve_flag),
+    ])
+
 libxl_device_dtdev = Struct("device_dtdev", [
     ("path", string),
     ])
@@ -589,6 +595,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (13 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:27   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5       | 21 +++++++++++++++++++++
 tools/libxl/libxl.h         |  6 ++++++
 tools/libxl/libxl_create.c  |  4 ++++
 tools/libxl/libxl_dom.c     |  8 +-------
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  3 +++
 6 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 638b350..079465f 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
 
 Note this would override global B<rdm> option.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this entry INVALID
+        to indicate we shouldn't expose this entry to hvmloader.
+
+Here the default is 2G.
+
 =back
 
 =back
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..a6212fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
     uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 30e6593..0438731 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
 {
     if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
         b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 34bd466..0987991 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
     libxl_domain_build_info *const info = &d_config->b_info;
-    /*
-     * Currently we fix this as 2G to guarantte how to handle
-     * our rdm policy. But we'll provide a parameter to set
-     * this dynamically.
-     */
-    uint64_t rdm_mem_boundary = 0x80000000;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mmio_start = mmio_start;
 
     ret = libxl__domain_device_construct_rdm(gc, d_config,
-                                             rdm_mem_boundary,
+                                             info->rdm_mem_boundary_memkb*1024,
                                              &args);
     if (ret) {
         LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 5ba075d..d130d48 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
     ("shadow_memkb",    MemKB),
+    ("rdm_mem_boundary_memkb",    MemKB),
     ("rtc_timeoffset",  uint32),
     ("exec_ssidref",    uint32),
     ("exec_ssid_label", string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5637c30..c7a12b1 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long (config, "videoram", &l, 0))
         b_info->video_memkb = l * 1024;
 
+    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+        b_info->rdm_mem_boundary_memkb = l * 1024;
+
     if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
         b_info->event_channels = l;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (14 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:33   ` Wei Liu
  2015-06-23  9:57 ` [v4][PATCH 17/19] xen/vtd: enable USB device assignment Tiejun Chen
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist. And hvmloader would need this info
later.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c      |  5 +++
 tools/libxl/libxl_internal.h | 24 +++++++++++++
 tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0987991..bc8fd5b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c0acf11..ae2f5e0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3714,6 +3714,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
                                     const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..be297b2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args)
+{
+    int rc = 0;
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    struct e820entry *e820 = NULL;
+    uint64_t highmem_size =
+                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+    /* Add all rdm entries. */
+    for (i = 0; i < d_config->num_rdms; i++)
+        if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+            e820_entries++;
+
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+        e820_entries++;
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+out:
+    return rc;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-basic-offset: 4
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 17/19] xen/vtd: enable USB device assignment
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (15 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 18/19] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters Tiejun Chen
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 59d5fd7..07f5c7c 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2239,11 +2239,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2297,13 +2295,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 18/19] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (16 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 17/19] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-23  9:57 ` [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters Tiejun Chen
  18 siblings, 0 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 07f5c7c..43ba131 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2291,13 +2291,39 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: in the future we can introduce group device assignment
+     * interface to make sure devices sharing RMRR are assigned to the
+     * same domain together.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn )
+        {
+            if ( rmrr->scope.devices_cnt > 1 )
+            {
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign %04x:%02x:%02x.%u"
+                       " with shared RMRR for Dom%d.\n",
+                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                       d->domain_id);
+                return -EPERM;
+            }
+        }
+    }
+
     ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
     if ( ret )
         return ret;
 
-    seg = pdev->seg;
-    bus = pdev->bus;
-
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters
  2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
                   ` (17 preceding siblings ...)
  2015-06-23  9:57 ` [v4][PATCH 18/19] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-06-23  9:57 ` Tiejun Chen
  2015-06-25 11:35   ` Wei Liu
  2015-06-30 16:30   ` George Dunlap
  18 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-23  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Stefano Stabellini

This patch parses to enable user configurable parameters to specify
RDM resource and according policies,

Global RDM parameter:
    rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 10 ++++++
 3 files changed, 106 insertions(+)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..9255878 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_TYPE      10
+#define STATE_RESERVE_FLAG      11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdm_reserve") ) {
+                    if ( !strcmp(tok, "strict") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                    } else if ( !strcmp(tok, "relaxed") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " flag: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
                 }else{
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
@@ -167,6 +181,84 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_TYPE;
+                *ptr = '\0';
+                if (strcmp(tok, "type")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_TYPE:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_FLAG;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
+                } else if (!strcmp(tok, "none")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_NONE;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM type option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_FLAG:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "reserve")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property flag value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c7a12b1..85d74fd 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1923,6 +1923,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->rdm.type = rdm.type;
+            b_info->rdm.reserve = rdm.reserve;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1937,6 +1945,8 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /* We'd like to force reserve rdm specific to a device by default.*/
+            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-23  9:57 ` [v4][PATCH 03/19] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-06-23 10:12   ` Jan Beulich
  2015-06-24  1:11     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-23 10:12 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, xen-devel

>>> On 23.06.15 at 11:57, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>  
>              while ( base_pfn < end_pfn )
>              {
> -                if ( intel_iommu_unmap_page(d, base_pfn) )
> +                if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )
>                      ret = -ENXIO;
>                  base_pfn++;
>              }
> @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>  
>      while ( base_pfn < end_pfn )
>      {
> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> -                                       IOMMUF_readable|IOMMUF_writable);
> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);

Shouldn't the two continue to be the inverse of one another?
Maybe guest_physmap_remove_page() does what you want,
but it looks like at least an abuse.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-23 10:12   ` Jan Beulich
@ 2015-06-24  1:11     ` Chen, Tiejun
  2015-06-24  6:48       ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-24  1:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Zhang, Kevin Tian, Tim Deegan, xen-devel

On 2015/6/23 18:12, Jan Beulich wrote:
>>>> On 23.06.15 at 11:57, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>
>>               while ( base_pfn < end_pfn )
>>               {
>> -                if ( intel_iommu_unmap_page(d, base_pfn) )
>> +                if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )

Yeah, I also thought this may bring some confusions in this context.

>>                       ret = -ENXIO;
>>                   base_pfn++;
>>               }
>> @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>
>>       while ( base_pfn < end_pfn )
>>       {
>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>> -                                       IOMMUF_readable|IOMMUF_writable);
>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>
> Shouldn't the two continue to be the inverse of one another?

Initially, instead of using guest_physmap_remove_page, I was trying to 
introduce a new, clear_identity_p2m_entry, which can wrapper 
p2m_remove_page().

But Tim just thought we'd better avoid duplicating code,

http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg02970.html

> Maybe guest_physmap_remove_page() does what you want,

Note actually we just need p2m_remove_page() to unmap these mapping on 
both ept and vt-d sides, and guest_physmap_remove_page is really a 
wrapper of p2m_remove_page().

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-24  1:11     ` Chen, Tiejun
@ 2015-06-24  6:48       ` Jan Beulich
  2015-06-24  7:26         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-24  6:48 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, Tim Deegan, xen-devel

>>> On 24.06.15 at 03:11, <tiejun.chen@intel.com> wrote:
> On 2015/6/23 18:12, Jan Beulich wrote:
>>>>> On 23.06.15 at 11:57, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>> @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>>
>>>               while ( base_pfn < end_pfn )
>>>               {
>>> -                if ( intel_iommu_unmap_page(d, base_pfn) )
>>> +                if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )
> 
> Yeah, I also thought this may bring some confusions in this context.
> 
>>>                       ret = -ENXIO;
>>>                   base_pfn++;
>>>               }
>>> @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>>
>>>       while ( base_pfn < end_pfn )
>>>       {
>>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>>> -                                       IOMMUF_readable|IOMMUF_writable);
>>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>>
>> Shouldn't the two continue to be the inverse of one another?
> 
> Initially, instead of using guest_physmap_remove_page, I was trying to 
> introduce a new, clear_identity_p2m_entry, which can wrapper 
> p2m_remove_page().
> 
> But Tim just thought we'd better avoid duplicating code,
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg02970.html 
> 
>> Maybe guest_physmap_remove_page() does what you want,
> 
> Note actually we just need p2m_remove_page() to unmap these mapping on 
> both ept and vt-d sides, and guest_physmap_remove_page is really a 
> wrapper of p2m_remove_page().

And I agree with Tim regarding the desire to avoid code duplication.
Yet that's no reason to do it asymmetrically:
clear_identity_p2m_entry() could still be an inline (or macro) wrapper
around guest_physmap_remove_page(). That way, apart from making
the code above look nicer, if the latter needs extending in the future
for some reason, simply converting the wrapper to a real function is
possible without needing to touch the call site(s).

This would need to go into patch 2; I wonder whether folding that
and this one wouldn't be warranted, avoiding the former adding
(at that point) dead code.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-24  6:48       ` Jan Beulich
@ 2015-06-24  7:26         ` Chen, Tiejun
  2015-06-24  7:33           ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-24  7:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Zhang, Kevin Tian, Tim Deegan, xen-devel

>> Note actually we just need p2m_remove_page() to unmap these mapping on
>> both ept and vt-d sides, and guest_physmap_remove_page is really a
>> wrapper of p2m_remove_page().
>
> And I agree with Tim regarding the desire to avoid code duplication.
> Yet that's no reason to do it asymmetrically:
> clear_identity_p2m_entry() could still be an inline (or macro) wrapper
> around guest_physmap_remove_page(). That way, apart from making

I can define that as a macro close to set_identity_p2m_entry() in p2m.h.

> the code above look nicer, if the latter needs extending in the future
> for some reason, simply converting the wrapper to a real function is
> possible without needing to touch the call site(s).
>
> This would need to go into patch 2; I wonder whether folding that

Yes.

> and this one wouldn't be warranted, avoiding the former adding

Are you saying to fold patch #2 and patch #3? But shouldn't we always 
define a new and then use that in practice subsequently? Even with two 
patches, respectively.

Thanks
Tiejun

> (at that point) dead code.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-24  7:26         ` Chen, Tiejun
@ 2015-06-24  7:33           ` Jan Beulich
  2015-06-30 10:40             ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-24  7:33 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Yang Zhang, Kevin Tian, Tim Deegan, xen-devel

>>> On 24.06.15 at 09:26, <tiejun.chen@intel.com> wrote:
>> This would need to go into patch 2; I wonder whether folding that
> 
> Yes.
> 
>> and this one wouldn't be warranted, avoiding the former adding
> 
> Are you saying to fold patch #2 and patch #3? But shouldn't we always 
> define a new and then use that in practice subsequently? Even with two 
> patches, respectively.

It's a matter of taste to some degree. Unless patches are really
involved, I prefer them not to add dead code. Apart from
eliminating the case of the code remaining dead (perhaps for
extended periods of time) if only parts of a series get applied, it
also generally helps review if one can see the consumer of a
newly added function right away.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-23  9:57 ` [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-06-25  9:59   ` Tim Deegan
  2015-07-01 15:43   ` George Dunlap
  1 sibling, 0 replies; 114+ messages in thread
From: Tim Deegan @ 2015-06-25  9:59 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

At 17:57 +0800 on 23 Jun (1435082233), Tiejun Chen wrote:
> We will create this sort of identity mapping as follows:
> 
> If the gfn space is unoccupied, we just set the mapping. If space
> is already occupied by desired identity mapping, do nothing.
> Otherwise, failure is returned.
> 
> And we also add a returning value to guest_physmap_remove_page()
> then make that as a better helper to clear such a p2m entry.
> 
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: Tim Deegan <tim@xen.org>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-06-23  9:57 ` [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-06-25 10:44   ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-25 10:44 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:20PM +0800, Tiejun Chen wrote:
> We will introduce the hypercall xc_reserved_device_memory_map
> approach to libxc. This helps us get rdm entry info according to
> different parameters. If flag == PCI_DEV_RDM_ALL, all entries
> should be exposed. Or we just expose that rdm entry specific to
> a SBDF.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-06-25 10:54   ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-25 10:54 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, David Scott, Stefano Stabellini, Ian Jackson, xen-devel,
	Ian Campbell

On Tue, Jun 23, 2015 at 05:57:21PM +0800, Tiejun Chen wrote:
> This patch passes rdm reservation policy to xc_assign_device() so the policy
> is checked when assigning devices to a VM.
> 
> Note this also bring some fallout to python usage of xc_assign_device().
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: David Scott <dave.scott@eu.citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build
  2015-06-23  9:57 ` [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build Tiejun Chen
@ 2015-06-25 11:08   ` Wei Liu
  2015-06-26  0:56     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:08 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:24PM +0800, Tiejun Chen wrote:
> After commit 5dff8e9eedc7, "libxc/libxl: fill xc_hvm_build_args in
> libxl" is introduced, we won't check to set args.mmio_size inside
> xc_hvm_build as before. So instead, we need to do this before call
> that.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

Sigh. I missed this because libxl doesn't use this function and there is
no in tree xend anymore.

I think you should move this earlier in this series. Presumably your RDM
changes depend on this.

Wei.

> ---
> v4:
> 
> * Separate this from currenpt patch #14 since this is specific to xc.
> 
>  tools/libxc/xc_hvm_build_x86.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 003ea06..7343e87 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -754,6 +754,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>      args.mem_size = (uint64_t)memsize << 20;
>      args.mem_target = (uint64_t)target << 20;
>      args.image_file_name = image_name;
> +    if ( args.mmio_size == 0 )
> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>  
>      return xc_hvm_build(xch, domid, &args);
>  }
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
  2015-06-23  9:57 ` [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-06-25 11:23   ` Wei Liu
  2015-06-26  5:45     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:23 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:25PM +0800, Tiejun Chen wrote:
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RAM.
> 

RAM -> RDM?

> RMRR can reside in address space beyond 4G theoretically, but we never
> see this in real world. So in order to avoid breaking highmem layout
> we don't solve highmem conflict. Note this means highmem rmrr could still
> be supported if no conflict.
> 
> But in the case of lowmem, RMRR probably scatter the whole RAM space.
> Especially multiple RMRR entries would worsen this to lead a complicated
> memory layout. And then its hard to extend hvm_info_table{} to work
> hvmloader out. So here we're trying to figure out a simple solution to
> avoid breaking existing layout. So when a conflict occurs,
> 
>     #1. Above a predefined boundary (2G)
>         - move lowmem_end below reserved region to solve conflict;
> 
>     #2. Below a predefined boundary (2G)
>         - Check strict/relaxed policy.
>         "strict" policy leads to fail libxl. Note when both policies
>         are specified on a given region, 'strict' is always preferred.
>         "relaxed" policy issue a warning message and also mask this entry INVALID
>         to indicate we shouldn't expose this entry to hvmloader.
> 
> Note later we need to provide a parameter to set that predefined boundary
> dynamically.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevint.tian@intel.com>
> ---
> v4:
> 
> * Consistent to use term "RDM".
> * Unconditionally set *nr_entries to 0
> * Grab to all sutffs to provide a parameter to set our predefined boundary
>   dynamically to as a separated patch later
> 
>  tools/libxl/libxl_create.c   |   2 +-
>  tools/libxl/libxl_dm.c       | 259 +++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_dom.c      |  17 ++-
>  tools/libxl/libxl_internal.h |  11 +-
>  tools/libxl/libxl_types.idl  |   7 ++
>  5 files changed, 293 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 6c8ec63..30e6593 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -460,7 +460,7 @@ int libxl__domain_build(libxl__gc *gc,
>  
>      switch (info->type) {
>      case LIBXL_DOMAIN_TYPE_HVM:
> -        ret = libxl__build_hvm(gc, domid, info, state);
> +        ret = libxl__build_hvm(gc, domid, d_config, state);
>          if (ret)
>              goto out;
>  
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 33f9ce6..5436bcf 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -90,6 +90,265 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>      return dm;
>  }
>  
> +static struct xen_reserved_device_memory
> +*xc_device_get_rdm(libxl__gc *gc,
> +                   uint32_t flag,
> +                   uint16_t seg,
> +                   uint8_t bus,
> +                   uint8_t devfn,
> +                   unsigned int *nr_entries)

I just notice this function lives in libxl_dm.c. The function should be
renamed to libxl__xc_device_get_rdm. 

This function should return proper libxl error code (ERROR_FAIL or
something more appropriate). The allocated RDM entries should be
returned with an out parameter.

I had always thought this lived in libxc. Sorry for not having noticed
this earlier.

> +{
> +    struct xen_reserved_device_memory *xrdm;
> +    int rc;
> +
> +    /*
> +     * We really can't presume how many entries we can get in advance.
> +     */
> +    *nr_entries = 0;
> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                       NULL, nr_entries);
> +    assert(rc <= 0);
> +    /* "0" means we have no any rdm entry. */
> +    if (!rc)
> +        goto out;
> +
> +    if (errno == ENOBUFS) {
> +        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));

libxl__malloc(gc, ...);

> +        if (!xrdm) {
> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
> +            goto out;
> +        }

Get rid of this.

> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                           xrdm, nr_entries);
> +        if (rc) {
> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
> +            *nr_entries = 0;
> +            free(xrdm);
> +            xrdm = NULL;

Get rid of free.

> +        }
> +    } else
> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
> +
> + out:
> +    return xrdm;
> +}

The reset of this patch looks good to me. It does what we've discussed.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary
  2015-06-23  9:57 ` [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
@ 2015-06-25 11:27   ` Wei Liu
  2015-06-26  6:54     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:27 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:26PM +0800, Tiejun Chen wrote:
> Previously we always fix that predefined boundary as 2G to handle
> conflict between memory and rdm, but now this predefined boundar
> can be changes with the parameter "rdm_mem_boundary" in .cfg file.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v4:
> 
> * Separated from the previous patch to provide a parameter to set that
>   predefined boundary dynamically.
> 
>  docs/man/xl.cfg.pod.5       | 21 +++++++++++++++++++++
>  tools/libxl/libxl.h         |  6 ++++++
>  tools/libxl/libxl_create.c  |  4 ++++
>  tools/libxl/libxl_dom.c     |  8 +-------
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/xl_cmdimpl.c    |  3 +++
>  6 files changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 638b350..079465f 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
>  
>  Note this would override global B<rdm> option.
>  
> +=item B<rdm_mem_boundary=MBYTES>
> +
> +Number of megabytes to set a boundary for checking rdm conflict.
> +
> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
> +Especially multiple RDM entries would worsen this to lead a complicated
> +memory layout. So here we're trying to figure out a simple solution to
> +avoid breaking existing layout. So when a conflict occurs,
> +
> +    #1. Above a predefined boundary
> +        - move lowmem_end below reserved region to solve conflict;
> +
> +    #2. Below a predefined boundary
> +        - Check strict/relaxed policy.
> +        "strict" policy leads to fail libxl. Note when both policies
> +        are specified on a given region, 'strict' is always preferred.
> +        "relaxed" policy issue a warning message and also mask this entry INVALID
> +        to indicate we shouldn't expose this entry to hvmloader.
> +

Can you check the generated manpage to see if the format is correct?

> +Here the default is 2G.
> +
>  =back
>  
>  =back
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 0a7913b..a6212fb 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
>  #define LIBXL_TIMER_MODE_DEFAULT -1
>  #define LIBXL_MEMKB_DEFAULT ~0ULL
>  
> +/*
> + * We'd like to set a memory boundary to determine if we need to check
> + * any overlap with reserved device memory.
> + */
> +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
> +
>  #define LIBXL_MS_VM_GENID_LEN 16
>  typedef struct {
>      uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 30e6593..0438731 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
>  {
>      if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>          b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> +
> +    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
> +        b_info->rdm_mem_boundary_memkb =
> +                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
>  }
>  
>  int libxl__domain_build_info_setdefault(libxl__gc *gc,
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 34bd466..0987991 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      int ret, rc = ERROR_FAIL;
>      uint64_t mmio_start, lowmem_end, highmem_end;
>      libxl_domain_build_info *const info = &d_config->b_info;
> -    /*
> -     * Currently we fix this as 2G to guarantte how to handle
> -     * our rdm policy. But we'll provide a parameter to set
> -     * this dynamically.
> -     */
> -    uint64_t rdm_mem_boundary = 0x80000000;
>  
>      memset(&args, 0, sizeof(struct xc_hvm_build_args));
>      /* The params from the configuration file are in Mb, which are then
> @@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      args.mmio_start = mmio_start;
>  
>      ret = libxl__domain_device_construct_rdm(gc, d_config,
> -                                             rdm_mem_boundary,
> +                                             info->rdm_mem_boundary_memkb*1024,
>                                               &args);
>      if (ret) {
>          LOG(ERROR, "checking reserved device memory failed");
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 5ba075d..d130d48 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("target_memkb",    MemKB),
>      ("video_memkb",     MemKB),
>      ("shadow_memkb",    MemKB),
> +    ("rdm_mem_boundary_memkb",    MemKB),

Should this be restricted to .hvm? This is HVM only feature after all.

Wei.

>      ("rtc_timeoffset",  uint32),
>      ("exec_ssidref",    uint32),
>      ("exec_ssid_label", string),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 5637c30..c7a12b1 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
>      if (!xlu_cfg_get_long (config, "videoram", &l, 0))
>          b_info->video_memkb = l * 1024;
>  
> +    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
> +        b_info->rdm_mem_boundary_memkb = l * 1024;
> +
>      if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
>          b_info->event_channels = l;
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map
  2015-06-23  9:57 ` [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-06-25 11:33   ` Wei Liu
  2015-06-26  7:13     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:33 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

The subject line should be changed. You're not extending that hypercall.

libxl: construct e820 map with RDM information for HVM guest 

On Tue, Jun 23, 2015 at 05:57:27PM +0800, Tiejun Chen wrote:
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist. And hvmloader would need this info
> later.
> 

I have one question. When RDM is disabled, the generated e820 map should
look exactly the same as before (i.e. without this patch), right?

Whatever the answer is, please say that in your commit log.

> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v4:
> 
> * Use goto style error handling.
> * Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.
> 
>  tools/libxl/libxl_dom.c      |  5 +++
>  tools/libxl/libxl_internal.h | 24 +++++++++++++
>  tools/libxl/libxl_x86.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 112 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 0987991..bc8fd5b 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>          goto out;
>      }
>  
> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
> +        LOG(ERROR, "setting domain memory map failed");
> +        goto out;
> +    }
> +
>      ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>                                 &state->store_mfn, state->console_port,
>                                 &state->console_mfn, state->store_domid,
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index c0acf11..ae2f5e0 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3714,6 +3714,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
>   */
>  void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
>                                      const libxl_bitmap *sptr);
> +
> +/*
> + * Here we're just trying to set these kinds of e820 mappings:
> + *
> + * #1. Low memory region
> + *
> + * Low RAM starts at least from 1M to make sure all standard regions
> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> + * have enough space.
> + * Note: Those stuffs below 1M are still constructed with multiple
> + * e820 entries by hvmloader. At this point we don't change anything.
> + *
> + * #2. RDM region if it exists
> + *
> + * #3. High memory region if it exists
> + *
> + * Note: these regions are not overlapping since we already check
> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
> + */
> +int libxl__domain_construct_e820(libxl__gc *gc,

hidden

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters
  2015-06-23  9:57 ` [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters Tiejun Chen
@ 2015-06-25 11:35   ` Wei Liu
  2015-06-30 16:30   ` George Dunlap
  1 sibling, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:35 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:30PM +0800, Tiejun Chen wrote:
> This patch parses to enable user configurable parameters to specify
> RDM resource and according policies,
> 
> Global RDM parameter:
>     rdm = "type=none/host,reserve=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. When both policies are specified on a given region, 'strict' is
> always preferred.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
@ 2015-06-25 11:37   ` Wei Liu
  2015-06-25 12:15   ` Ian Campbell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:37 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:23PM +0800, Tiejun Chen wrote:
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

The code looks good to me. I will wait for native English speakers to
have a look at the docs.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-06-25 11:38   ` Wei Liu
  2015-06-25 12:13   ` Ian Campbell
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-25 11:38 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 05:57:22PM +0800, Tiejun Chen wrote:
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
> 
> Global RDM parameter:
>     rdm = "type=none/host,reserve=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
> 
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. Instead, 'none' means we have nothing
> to do all reserved regions and ignore all policies, so guest work as before.
> 
> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM will be killed, while 'relaxed' allows moving forward with a warning
> message thrown out.
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. When both policies are specified on a given region, 'strict' is
> always preferred.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

The code looks good to me. I will wait for native English speakers to
have a look at the docs.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
  2015-06-25 11:38   ` Wei Liu
@ 2015-06-25 12:13   ` Ian Campbell
  2015-06-26  8:38     ` Chen, Tiejun
  2015-06-25 12:31   ` Ian Jackson
  2015-06-30 15:54   ` George Dunlap
  3 siblings, 1 reply; 114+ messages in thread
From: Ian Campbell @ 2015-06-25 12:13 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

On Tue, 2015-06-23 at 17:57 +0800, Tiejun Chen wrote:
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index a3e0e2e..638b350 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -655,6 +655,49 @@ assigned slave device.
>  
>  =back
>  
> +=item B<rdm="RDM_RESERVE_STRING">

Would RDM_RESERVATION_STRING more accurately describe this?

> +
> +(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),

Drop "the"

> +which is necessary to enable robust device passthrough. One example of RDM
> +is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
> +on x86 platform.
> +
> +B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
> +
> +=over 4
> +
> +=item B<KEY=VALUE>
> +
> +Possible B<KEY>s are:
> +
> +=over 4
> +
> +=item B<type="STRING">
> +
> +Currently we just have two types:

"Currently there are only two types". Although I would probably just say
"Valid types are"

> +"host" means all reserved device memory on this platform should be reserved
> +in this VM's guest address space space. This global RDM parameter allows

"space space"

> +user to specify reserved regions explicitly. And using "host" to include all
> +reserved regions reported on this platform which is good to handle hotplug
> +scenario.

I'm having trouble parsing this sentence, but I think you mean something
like:
        Using "host" includes all reserved regions reported on this
        platform, which is useful when doing hotplugging.

>  In the future this parameter may be further extended to allow
> +specifying random regions, e.g. even those belonging to another platform as
> +a preparation for live migration with passthrough devices.

Lets document future stuff as it is implemented rather than leaving what
is effectively a TODO in the face of the user.

> +
> +"none" means we have nothing to do all reserved regions and ignore all policies,
> +so guest work as before.

This doesn't read right, but I'm not sure what you are trying to say so
I can't suggest an alternative.

How is type=none different from just not specifying rdm at all?

> +
> +=over 4

Won't all these "=over 4"'s accumulate into a very deep indentation? I
think you only need the first one (before the list) and the one before
the nested list of types. In both cases you also need an "=back" at the
end of the respective list to unwind the =over.

> +
> +=item B<reserve="STRING">
> +
> +Conflict may be detected when reserving reserved device memory in guest address
> +space. "strict" means an unsolved conflict leads to immediate VM crash, while
> +"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
> +is default.

I think I would say:

        Specifies how to deal with conflicts discovered when reserving
        reserved device memory in the guest address space. "strict"
        means...

Having read all these docs I now know what all the options are, but I
still don't really know what I should write. I think an example or two
of real world usage would be helpful.

> +
> +Note this may be overridden by rdm_reserve option in PCI device configuration.
> +
>  =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
>  
>  Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
> @@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
>  D0-D3hot power management states for the PCI device. False (0) by
>  default.
>  
> +=item B<rdm_reserv="STRING">
> +
> +(HVM/x86 only) This is same as reserve option above but just specific
> +to a given device, and "strict" is default here.

Rather than "above" (which is quite a large block of text) you should
specifically mention the rdm option.

> +
> +Note this would override global B<rdm> option.
> +

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
  2015-06-25 11:37   ` Wei Liu
@ 2015-06-25 12:15   ` Ian Campbell
  2015-06-26  8:53     ` Chen, Tiejun
  2015-06-25 12:33   ` Ian Jackson
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 114+ messages in thread
From: Ian Campbell @ 2015-06-25 12:15 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

On Tue, 2015-06-23 at 17:57 +0800, Tiejun Chen wrote:
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v4:
> 
> * Fix one typo, s/unkwon/unknown
> * In command description, we should use "[]" to indicate it's optional
>   for that extended xl command, pci-attach.
> 
>  docs/man/xl.pod.1         |  7 ++++++-
>  tools/libxl/libxl_pci.c   | 10 +++++++++-
>  tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
>  tools/libxl/xl_cmdtable.c |  2 +-
>  4 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 4eb929d..c5c4809 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
>  usable by Domain 0 again.  If the device is not bound to pciback, it will
>  return success.
>  
> -=item B<pci-attach> I<domain-id> I<BDF>
> +=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
>  
>  Hot-plug a new pass-through pci device to the specified domain.
>  B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
> +B<rdm policy> is about how to handle conflict between reserving reserved device

s/is about/specifies/ and I think s/between/while/

> +memory and guest address space. "strict" means an unsolved conflict leads to

I think you mean "in" rather than "and"?


> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> +message thrown out. Here "strict" is default.

"The default is "strict"".

You've repeated the list of allowed values for this two or three times
now in the various docs, perhaps try and centralise on one definition
and cross reference instead?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
  2015-06-25 11:38   ` Wei Liu
  2015-06-25 12:13   ` Ian Campbell
@ 2015-06-25 12:31   ` Ian Jackson
  2015-06-30  3:07     ` Chen, Tiejun
  2015-06-30 15:54   ` George Dunlap
  3 siblings, 1 reply; 114+ messages in thread
From: Ian Jackson @ 2015-06-25 12:31 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy"):
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
...
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. Instead, 'none' means we have nothing
> to do all reserved regions and ignore all policies, so guest work as before.

I think the description in the documentation needs to have more
user-focused information.  It's not quite clear to me what the
tradeoffs are of the different options.


(Your use of "random" here is rather information.  You should say
"arbitrary".)

Ian.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
  2015-06-25 11:37   ` Wei Liu
  2015-06-25 12:15   ` Ian Campbell
@ 2015-06-25 12:33   ` Ian Jackson
  2015-06-30  2:14     ` Chen, Tiejun
  2015-06-30 15:56   ` George Dunlap
  2015-06-30 16:11   ` George Dunlap
  4 siblings, 1 reply; 114+ messages in thread
From: Ian Jackson @ 2015-06-25 12:33 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

Tiejun Chen writes ("[v4][PATCH 12/19] tools/libxl: passes rdm reservation policy"):
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.
...
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 4eb929d..c5c4809 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
>  usable by Domain 0 again.  If the device is not bound to pciback, it will
>  return success.
>  
> -=item B<pci-attach> I<domain-id> I<BDF>
> +=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
>  
>  Hot-plug a new pass-through pci device to the specified domain.
>  B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
> +B<rdm policy> is about how to handle conflict between reserving reserved device
> +memory and guest address space. "strict" means an unsolved conflict leads to
> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> +message thrown out. Here "strict" is default.

Surely it would be better to reject the attach attempt rather than
crashing the domain.

Also, again, this text is not really user-focused.  It needs to
explain what the risks of using `relaxed' are (or what other checks or
countermeasures the admin should use before setting `relaxed').

Ian.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build
  2015-06-25 11:08   ` Wei Liu
@ 2015-06-26  0:56     ` Chen, Tiejun
  2015-06-26 12:07       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  0:56 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On 2015/6/25 19:08, Wei Liu wrote:
> On Tue, Jun 23, 2015 at 05:57:24PM +0800, Tiejun Chen wrote:
>> After commit 5dff8e9eedc7, "libxc/libxl: fill xc_hvm_build_args in
>> libxl" is introduced, we won't check to set args.mmio_size inside
>> xc_hvm_build as before. So instead, we need to do this before call
>> that.
>>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
>
> Sigh. I missed this because libxl doesn't use this function and there is
> no in tree xend anymore.
>
> I think you should move this earlier in this series. Presumably your RDM
> changes depend on this.
>

Maybe this can be sent out independently since this is more like a fix, 
right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
  2015-06-25 11:23   ` Wei Liu
@ 2015-06-26  5:45     ` Chen, Tiejun
  2015-06-26 12:13       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  5:45 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On 2015/6/25 19:23, Wei Liu wrote:
> On Tue, Jun 23, 2015 at 05:57:25PM +0800, Tiejun Chen wrote:
>> While building a VM, HVM domain builder provides struct hvm_info_table{}
>> to help hvmloader. Currently it includes two fields to construct guest
>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
>> check them to fix any conflict with RAM.
>>
>
> RAM -> RDM?

Fixed.

>
>> RMRR can reside in address space beyond 4G theoretically, but we never

[snip]

>> +static struct xen_reserved_device_memory
>> +*xc_device_get_rdm(libxl__gc *gc,
>> +                   uint32_t flag,
>> +                   uint16_t seg,
>> +                   uint8_t bus,
>> +                   uint8_t devfn,
>> +                   unsigned int *nr_entries)
>
> I just notice this function lives in libxl_dm.c. The function should be
> renamed to libxl__xc_device_get_rdm.
>
> This function should return proper libxl error code (ERROR_FAIL or
> something more appropriate). The allocated RDM entries should be

ERROR_FAIL is better.

So refactor this function after address your all comments,

static int
libxl__xc_device_get_rdm(libxl__gc *gc,
                          uint32_t flag,
                          uint16_t seg,
                          uint8_t bus,
                          uint8_t devfn,
                          unsigned int *nr_entries,
                          struct xen_reserved_device_memory *xrdm)
{
     int rc;

     /*
      * We really can't presume how many entries we can get in advance.
      */
     *nr_entries = 0;
     rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
                                        NULL, nr_entries);
     assert(rc <= 0);
     /* "0" means we have no any rdm entry. */
     if (!rc)
 
 
        94,22          3%
     /* "0" means we have no any rdm entry. */
     if (!rc)
         goto out;

     if (errno == ENOBUFS) {
         xrdm = libxl__malloc(gc,
                              *nr_entries *
                              sizeof(xen_reserved_device_memory_t));
         rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
                                            xrdm, nr_entries);
         if (rc) {
             LOG(ERROR, "Could not get reserved device memory maps.\n");
             rc = ERROR_FAIL;
         }
     } else {
         LOG(ERROR, "Could not get reserved device memory maps.\n");
         rc = ERROR_FAIL;
     }

  out:
     if (rc) {
         *nr_entries = 0;
         xrdm = NULL;
     }
     return rc;
}

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary
  2015-06-25 11:27   ` Wei Liu
@ 2015-06-26  6:54     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  6:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index 638b350..079465f 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
>>
>>   Note this would override global B<rdm> option.
>>
>> +=item B<rdm_mem_boundary=MBYTES>
>> +
>> +Number of megabytes to set a boundary for checking rdm conflict.
>> +
>> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
>> +Especially multiple RDM entries would worsen this to lead a complicated
>> +memory layout. So here we're trying to figure out a simple solution to
>> +avoid breaking existing layout. So when a conflict occurs,
>> +
>> +    #1. Above a predefined boundary
>> +        - move lowmem_end below reserved region to solve conflict;
>> +
>> +    #2. Below a predefined boundary
>> +        - Check strict/relaxed policy.
>> +        "strict" policy leads to fail libxl. Note when both policies
>> +        are specified on a given region, 'strict' is always preferred.
>> +        "relaxed" policy issue a warning message and also mask this entry INVALID
>> +        to indicate we shouldn't expose this entry to hvmloader.
>> +
>
> Can you check the generated manpage to see if the format is correct?
>

Yes, the format is correct but according to your comment below, I need 
to move this elsewhere.

>> +Here the default is 2G.
>> +
>>   =back
>>

[snip]

>>       if (ret) {
>>           LOG(ERROR, "checking reserved device memory failed");
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 5ba075d..d130d48 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>>       ("target_memkb",    MemKB),
>>       ("video_memkb",     MemKB),
>>       ("shadow_memkb",    MemKB),
>> +    ("rdm_mem_boundary_memkb",    MemKB),
>
> Should this be restricted to .hvm? This is HVM only feature after all.
>

Yes, I think this make sense but I think should move that parameter, 
"rdm" into .hvm as well.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map
  2015-06-25 11:33   ` Wei Liu
@ 2015-06-26  7:13     ` Chen, Tiejun
  2015-06-26 12:14       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  7:13 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On 2015/6/25 19:33, Wei Liu wrote:
> The subject line should be changed. You're not extending that hypercall.
>
> libxl: construct e820 map with RDM information for HVM guest
>

Agreed.

> On Tue, Jun 23, 2015 at 05:57:27PM +0800, Tiejun Chen wrote:
>> Here we'll construct a basic guest e820 table via
>> XENMEM_set_memory_map. This table includes lowmem, highmem
>> and RDMs if they exist. And hvmloader would need this info
>> later.
>>
>
> I have one question. When RDM is disabled, the generated e820 map should
> look exactly the same as before (i.e. without this patch), right?

Yes.

>
> Whatever the answer is, please say that in your commit log.

What about this sentence?

Note this guest e820 table would be same as before if the
platform has no any RDMs or we disable RDM (by default).

>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---

[snip]

>> + *
>> + * Note: these regions are not overlapping since we already check
>> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
>> + */
>> +int libxl__domain_construct_e820(libxl__gc *gc,
>
> hidden

Okay.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-25 12:13   ` Ian Campbell
@ 2015-06-26  8:38     ` Chen, Tiejun
  2015-06-26  8:57       ` Ian Campbell
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  8:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

Thanks for your all corrections.


>> +=item B<type="STRING">
>> +
>> +Currently we just have two types:
>
> "Currently there are only two types". Although I would probably just say
> "Valid types are"

So let say "Currently there are only two valid types".

>
>> +"host" means all reserved device memory on this platform should be reserved

[snip]

>>   In the future this parameter may be further extended to allow
>> +specifying random regions, e.g. even those belonging to another platform as
>> +a preparation for live migration with passthrough devices.
>
> Lets document future stuff as it is implemented rather than leaving what
> is effectively a TODO in the face of the user.

Okay but I'm not very sure what's that format to introduce a TODO here. 
Maybe its just like this,

...
regions reported on this platform, which is useful when doing hotplugging.

TODO: in the future this parameter may be further extended to allow 
specifying arbitrary regions, e.g. even those belonging to another 
platform as a preparation for live migration with passthrough devices.

...

>
>> +
>> +"none" means we have nothing to do all reserved regions and ignore all policies,
>> +so guest work as before.
>
> This doesn't read right, but I'm not sure what you are trying to say so
> I can't suggest an alternative.
>
> How is type=none different from just not specifying rdm at all?

They're same behavior since "none" is our default option.

Just let me rephrase this,

"none" means we don't check any reserved regions and then all rdm 
policies would be ignored, so guest just work as before.

>
>> +
>> +=over 4
>
> Won't all these "=over 4"'s accumulate into a very deep indentation? I
> think you only need the first one (before the list) and the one before
> the nested list of types. In both cases you also need an "=back" at the
> end of the respective list to unwind the =over.

You're right so I also found this fault with `make docs` and I really 
should remove this here.

>
>> +
>> +=item B<reserve="STRING">
>> +
>> +Conflict may be detected when reserving reserved device memory in guest address
>> +space. "strict" means an unsolved conflict leads to immediate VM crash, while
>> +"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
>> +is default.
>
> I think I would say:
>
>          Specifies how to deal with conflicts discovered when reserving
>          reserved device memory in the guest address space. "strict"
>          means...
>

Nice and thanks.

> Having read all these docs I now know what all the options are, but I
> still don't really know what I should write. I think an example or two
> of real world usage would be helpful.

Here I picked some code fragments to help you understand this,

+        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", 
d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;


The above is just on tools side, and actually the similar also should 
happen on hypervisor side. Anyway, "strict" would make VM failed.

>
>> +
>> +Note this may be overridden by rdm_reserve option in PCI device configuration.
>> +
>>   =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
>>
>>   Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
>> @@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
>>   D0-D3hot power management states for the PCI device. False (0) by
>>   default.
>>
>> +=item B<rdm_reserv="STRING">
>> +
>> +(HVM/x86 only) This is same as reserve option above but just specific
>> +to a given device, and "strict" is default here.
>
> Rather than "above" (which is quite a large block of text) you should
> specifically mention the rdm option.
>

What about this?

(HVM/x86 only) This is same as reserve option inside the rdm option
but just specific to a given device, and "strict" is default here.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-25 12:15   ` Ian Campbell
@ 2015-06-26  8:53     ` Chen, Tiejun
  2015-06-26  9:01       ` Ian Campbell
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  8:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

>>   B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
>> +B<rdm policy> is about how to handle conflict between reserving reserved device
>
> s/is about/specifies/

Okay.

and I think s/between/while/
>
>> +memory and guest address space. "strict" means an unsolved conflict leads to
>
> I think you mean "in" rather than "and"?

Actually, as you see I was originally trying to say " conflict between A 
and B".

What about this?

B<rdm policy> specifies how to handle conflict between reserved device 
memory space and guest address space...

>
>
>> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
>> +message thrown out. Here "strict" is default.
>
> "The default is "strict"".

Okay.

>
> You've repeated the list of allowed values for this two or three times
> now in the various docs, perhaps try and centralise on one definition
> and cross reference instead?
>

I will delete this last sentence and add this,

Please refer "reserve" option to the rdm option in xl.cfg.5.txt.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-26  8:38     ` Chen, Tiejun
@ 2015-06-26  8:57       ` Ian Campbell
  2015-06-26  9:36         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Ian Campbell @ 2015-06-26  8:57 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

On Fri, 2015-06-26 at 16:38 +0800, Chen, Tiejun wrote:
> Thanks for your all corrections.
> 
> 
> >> +=item B<type="STRING">
> >> +
> >> +Currently we just have two types:
> >
> > "Currently there are only two types". Although I would probably just say
> > "Valid types are"
> 
> So let say "Currently there are only two valid types".
> 
> >
> >> +"host" means all reserved device memory on this platform should be reserved
> 
> [snip]
> 
> >>   In the future this parameter may be further extended to allow
> >> +specifying random regions, e.g. even those belonging to another platform as
> >> +a preparation for live migration with passthrough devices.
> >
> > Lets document future stuff as it is implemented rather than leaving what
> > is effectively a TODO in the face of the user.
> 
> Okay but I'm not very sure what's that format to introduce a TODO here. 
> Maybe its just like this,
> 
> ...
> regions reported on this platform, which is useful when doing hotplugging.
> 
> TODO: in the future this parameter may be further extended to allow 
> specifying arbitrary regions, e.g. even those belonging to another 
> platform as a preparation for live migration with passthrough devices.

I don't think this needs to be explained in this document at all.
Whenever someone does that work they can update the docs to describe the
new functionality.

> 
> ...
> 
> >
> >> +
> >> +"none" means we have nothing to do all reserved regions and ignore all policies,
> >> +so guest work as before.
> >
> > This doesn't read right, but I'm not sure what you are trying to say so
> > I can't suggest an alternative.
> >
> > How is type=none different from just not specifying rdm at all?
> 
> They're same behavior since "none" is our default option.
> 
> Just let me rephrase this,
> 
> "none" means we don't check any reserved regions and then all rdm 
> policies would be ignored, so guest just work as before.

When or why would I write:
        rdm = "none"
in my configuration file instead of just not saying anything?


> > Having read all these docs I now know what all the options are, but I
> > still don't really know what I should write. I think an example or two
> > of real world usage would be helpful.
> 
> Here I picked some code fragments to help you understand this,

I meant an example or two in the documentation.

The code fragment didn't answer my question either, but that's not
really the point.

> >
> >> +Note this may be overridden by rdm_reserve option in PCI device configuration.
> >> +
> >>   =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
> >>
> >>   Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
> >> @@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
> >>   D0-D3hot power management states for the PCI device. False (0) by
> >>   default.
> >>
> >> +=item B<rdm_reserv="STRING">
> >> +
> >> +(HVM/x86 only) This is same as reserve option above but just specific
> >> +to a given device, and "strict" is default here.
> >
> > Rather than "above" (which is quite a large block of text) you should
> > specifically mention the rdm option.
> >
> 
> What about this?
> 
> (HVM/x86 only) This is same as reserve option inside the rdm option
> but just specific to a given device, and "strict" is default here.

Is strict the default everywhere or does it differ depending on the
context?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-26  8:53     ` Chen, Tiejun
@ 2015-06-26  9:01       ` Ian Campbell
  2015-06-26  9:28         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Ian Campbell @ 2015-06-26  9:01 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

On Fri, 2015-06-26 at 16:53 +0800, Chen, Tiejun wrote:
> >>   B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
> >> +B<rdm policy> is about how to handle conflict between reserving reserved device
> >
> > s/is about/specifies/
> 
> Okay.
> 
> and I think s/between/while/
> >
> >> +memory and guest address space. "strict" means an unsolved conflict leads to
> >
> > I think you mean "in" rather than "and"?
> 
> Actually, as you see I was originally trying to say " conflict between A 
> and B".
> 
> What about this?
> 
> B<rdm policy> specifies how to handle conflict between reserved device 
> memory space and guest address space...

I'd say either "handle conflicts between" or "handle any conflict
between".

> 
> >
> >
> >> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> >> +message thrown out. Here "strict" is default.
> >
> > "The default is "strict"".
> 
> Okay.
> 
> >
> > You've repeated the list of allowed values for this two or three times
> > now in the various docs, perhaps try and centralise on one definition
> > and cross reference instead?
> >
> 
> I will delete this last sentence and add this,
> 
> Please refer "reserve" option to the rdm option in xl.cfg.5.txt.

Say "xl.cfg(5)" which is a neutral reference to the manpage in any form.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-26  9:01       ` Ian Campbell
@ 2015-06-26  9:28         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  9:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

>> I will delete this last sentence and add this,
>>
>> Please refer "reserve" option to the rdm option in xl.cfg.5.txt.
>
> Say "xl.cfg(5)" which is a neutral reference to the manpage in any form.
>

I'm trying to follow an existing example here, so

This is same as "reserve" option to the rdm option, please see L<xl.cfg(5)>.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-26  8:57       ` Ian Campbell
@ 2015-06-26  9:36         ` Chen, Tiejun
  2015-06-26 12:06           ` Wei Liu
  2015-06-30  3:08           ` Chen, Tiejun
  0 siblings, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-26  9:36 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Stefano Stabellini, Ian Jackson, xen-devel

>> TODO: in the future this parameter may be further extended to allow
>> specifying arbitrary regions, e.g. even those belonging to another
>> platform as a preparation for live migration with passthrough devices.
>
> I don't think this needs to be explained in this document at all.
> Whenever someone does that work they can update the docs to describe the
> new functionality.

Okay.

>
>>
>> ...
>>
>>>
>>>> +
>>>> +"none" means we have nothing to do all reserved regions and ignore all policies,

[snip]

>> Just let me rephrase this,
>>
>> "none" means we don't check any reserved regions and then all rdm
>> policies would be ignored, so guest just work as before.
>
> When or why would I write:
>          rdm = "none"
> in my configuration file instead of just not saying anything?

As you know we just have two options, "none" vs. "host". So we need a 
explicit flag as a default libxl value to work out our mechanism.

+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ])
+

We just think this name can make sense, right?

>
>
>>> Having read all these docs I now know what all the options are, but I
>>> still don't really know what I should write. I think an example or two
>>> of real world usage would be helpful.
>>
>> Here I picked some code fragments to help you understand this,
>
> I meant an example or two in the documentation.
>
> The code fragment didn't answer my question either, but that's not
> really the point.

Do you mean I should write two example, respectively?

#1. What is one actual conflict?
#2. After we're trying to fix this conflict,
#2.1 How will "strict" fail a VM?
#2.2 How will "relaxed" impact on a VM?

>
>>>
>>>> +Note this may be overridden by rdm_reserve option in PCI device configuration.
>>>> +

[snip]

>>> Rather than "above" (which is quite a large block of text) you should
>>> specifically mention the rdm option.
>>>
>>
>> What about this?
>>
>> (HVM/x86 only) This is same as reserve option inside the rdm option
>> but just specific to a given device, and "strict" is default here.
>
> Is strict the default everywhere or does it differ depending on the
> context?

The latter. We have two cases, a global case and a per device case, and 
they're different at this point.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-26  9:36         ` Chen, Tiejun
@ 2015-06-26 12:06           ` Wei Liu
  2015-06-29  1:01             ` Chen, Tiejun
  2015-06-30  3:08           ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-26 12:06 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Fri, Jun 26, 2015 at 05:36:19PM +0800, Chen, Tiejun wrote:
> >>TODO: in the future this parameter may be further extended to allow
> >>specifying arbitrary regions, e.g. even those belonging to another
> >>platform as a preparation for live migration with passthrough devices.
> >
> >I don't think this needs to be explained in this document at all.
> >Whenever someone does that work they can update the docs to describe the
> >new functionality.
> 
> Okay.
> 
> >
> >>
> >>...
> >>
> >>>
> >>>>+
> >>>>+"none" means we have nothing to do all reserved regions and ignore all policies,
> 
> [snip]
> 
> >>Just let me rephrase this,
> >>
> >>"none" means we don't check any reserved regions and then all rdm
> >>policies would be ignored, so guest just work as before.
> >
> >When or why would I write:
> >         rdm = "none"
> >in my configuration file instead of just not saying anything?
> 
> As you know we just have two options, "none" vs. "host". So we need a
> explicit flag as a default libxl value to work out our mechanism.
> 
> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
> +    (0, "none"),
> +    (1, "host"),
> +    ])
> +
> 
> We just think this name can make sense, right?
> 

What Ian was getting at was specifying type=none is the same as not
specifying at all. So on *xl* level we can just expose "host" as the
only valid type.

Ian, correct me if I misunderstand.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build
  2015-06-26  0:56     ` Chen, Tiejun
@ 2015-06-26 12:07       ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-26 12:07 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Fri, Jun 26, 2015 at 08:56:50AM +0800, Chen, Tiejun wrote:
> On 2015/6/25 19:08, Wei Liu wrote:
> >On Tue, Jun 23, 2015 at 05:57:24PM +0800, Tiejun Chen wrote:
> >>After commit 5dff8e9eedc7, "libxc/libxl: fill xc_hvm_build_args in
> >>libxl" is introduced, we won't check to set args.mmio_size inside
> >>xc_hvm_build as before. So instead, we need to do this before call
> >>that.
> >>
> >>CC: Ian Jackson <ian.jackson@eu.citrix.com>
> >>CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> >>CC: Ian Campbell <ian.campbell@citrix.com>
> >>CC: Wei Liu <wei.liu2@citrix.com>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> >
> >Acked-by: Wei Liu <wei.liu2@citrix.com>
> >
> >Sigh. I missed this because libxl doesn't use this function and there is
> >no in tree xend anymore.
> >
> >I think you should move this earlier in this series. Presumably your RDM
> >changes depend on this.
> >
> 
> Maybe this can be sent out independently since this is more like a fix,
> right?
> 

Yes. That's right. It's not tied to your RDM series.

> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
  2015-06-26  5:45     ` Chen, Tiejun
@ 2015-06-26 12:13       ` Wei Liu
  2015-06-29  6:36         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-26 12:13 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Fri, Jun 26, 2015 at 01:45:02PM +0800, Chen, Tiejun wrote:
> On 2015/6/25 19:23, Wei Liu wrote:
> >On Tue, Jun 23, 2015 at 05:57:25PM +0800, Tiejun Chen wrote:
> >>While building a VM, HVM domain builder provides struct hvm_info_table{}
> >>to help hvmloader. Currently it includes two fields to construct guest
> >>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> >>check them to fix any conflict with RAM.
> >>
> >
> >RAM -> RDM?
> 
> Fixed.
> 
> >
> >>RMRR can reside in address space beyond 4G theoretically, but we never
> 
> [snip]
> 
> >>+static struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(libxl__gc *gc,
> >>+                   uint32_t flag,
> >>+                   uint16_t seg,
> >>+                   uint8_t bus,
> >>+                   uint8_t devfn,
> >>+                   unsigned int *nr_entries)
> >
> >I just notice this function lives in libxl_dm.c. The function should be
> >renamed to libxl__xc_device_get_rdm.
> >
> >This function should return proper libxl error code (ERROR_FAIL or
> >something more appropriate). The allocated RDM entries should be
> 
> ERROR_FAIL is better.
> 
> So refactor this function after address your all comments,
> 
> static int
> libxl__xc_device_get_rdm(libxl__gc *gc,
>                          uint32_t flag,
>                          uint16_t seg,
>                          uint8_t bus,
>                          uint8_t devfn,
>                          unsigned int *nr_entries,
>                          struct xen_reserved_device_memory *xrdm)

Shouldn't this be struct xen_reserved_device_memory **xrdm? That is,
pointer to a pointer. The code structure looks OK.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map
  2015-06-26  7:13     ` Chen, Tiejun
@ 2015-06-26 12:14       ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-26 12:14 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Fri, Jun 26, 2015 at 03:13:17PM +0800, Chen, Tiejun wrote:
> On 2015/6/25 19:33, Wei Liu wrote:
> >The subject line should be changed. You're not extending that hypercall.
> >
> >libxl: construct e820 map with RDM information for HVM guest
> >
> 
> Agreed.
> 
> >On Tue, Jun 23, 2015 at 05:57:27PM +0800, Tiejun Chen wrote:
> >>Here we'll construct a basic guest e820 table via
> >>XENMEM_set_memory_map. This table includes lowmem, highmem
> >>and RDMs if they exist. And hvmloader would need this info
> >>later.
> >>
> >
> >I have one question. When RDM is disabled, the generated e820 map should
> >look exactly the same as before (i.e. without this patch), right?
> 
> Yes.
> 
> >
> >Whatever the answer is, please say that in your commit log.
> 
> What about this sentence?
> 
> Note this guest e820 table would be same as before if the
> platform has no any RDMs or we disable RDM (by default).
> 

Yes this looks good.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-26 12:06           ` Wei Liu
@ 2015-06-29  1:01             ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-29  1:01 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

>>>> Just let me rephrase this,
>>>>
>>>> "none" means we don't check any reserved regions and then all rdm
>>>> policies would be ignored, so guest just work as before.
>>>
>>> When or why would I write:
>>>          rdm = "none"
>>> in my configuration file instead of just not saying anything?
>>
>> As you know we just have two options, "none" vs. "host". So we need a
>> explicit flag as a default libxl value to work out our mechanism.
>>
>> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
>> +    (0, "none"),
>> +    (1, "host"),
>> +    ])
>> +
>>
>> We just think this name can make sense, right?
>>
>
> What Ian was getting at was specifying type=none is the same as not
> specifying at all. So on *xl* level we can just expose "host" as the

Yes, but this is just a default value so user can set explicitly or 
don't specify anything in guest .cfg file, I think both cases should be 
allowed, shouldn't they?

Thanks
Tiejun

> only valid type.
>
> Ian, correct me if I misunderstand.
>
> Wei.
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
  2015-06-26 12:13       ` Wei Liu
@ 2015-06-29  6:36         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-29  6:36 UTC (permalink / raw)
  To: Wei Liu; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

>> static int
>> libxl__xc_device_get_rdm(libxl__gc *gc,
>>                           uint32_t flag,
>>                           uint16_t seg,
>>                           uint8_t bus,
>>                           uint8_t devfn,
>>                           unsigned int *nr_entries,
>>                           struct xen_reserved_device_memory *xrdm)
>
> Shouldn't this be struct xen_reserved_device_memory **xrdm? That is,

You're right, please review this next revision.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-25 12:33   ` Ian Jackson
@ 2015-06-30  2:14     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30  2:14 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> +B<rdm policy> is about how to handle conflict between reserving reserved device
>> +memory and guest address space. "strict" means an unsolved conflict leads to
>> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
>> +message thrown out. Here "strict" is default.
>
> Surely it would be better to reject the attach attempt rather than
> crashing the domain.

Yes, indeed we just fail to create this VM in the case of runtime, or to 
hotplug this associated device in the case of hotplug.

I would rephrase this somewhere, the here just introduce a reference 
like this as Campbell said to me,

+B<rdm policy> specifies how to handle conflicts between reserved device 
memory
+space and guest address space. This is same as "reserve" option to the rdm
+option, please see L<xl.cfg(5)>.
+

>
> Also, again, this text is not really user-focused.  It needs to
> explain what the risks of using `relaxed' are (or what other checks or
> countermeasures the admin should use before setting `relaxed').
>

After make this referring to other place, please see this on another email.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-25 12:31   ` Ian Jackson
@ 2015-06-30  3:07     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30  3:07 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/6/25 20:31, Ian Jackson wrote:
> Tiejun Chen writes ("[v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy"):
>> This patch introduces user configurable parameters to specify RDM
>> resource and according policies,
> ...
>> Global RDM parameter, "type", allows user to specify reserved regions
>> explicitly, e.g. using 'host' to include all reserved regions reported
>> on this platform which is good to handle hotplug scenario. In the future
>> this parameter may be further extended to allow specifying random regions,
>> e.g. even those belonging to another platform as a preparation for live
>> migration with passthrough devices. Instead, 'none' means we have nothing
>> to do all reserved regions and ignore all policies, so guest work as before.
>
> I think the description in the documentation needs to have more
> user-focused information.  It's not quite clear to me what the
> tradeoffs are of the different options.

I'm trying to improve this section like this,

Currently there are only two valid types:

"host" means all reserved device memory on this platform should be 
checked to reserve regions in this VM's guest address space. This global 
RDM parameter allows user to specify reserved regions explicitly, and 
using "host" includes all reserved regions reported on this platform, 
which is useful when doing hotplug.

"none" is the default value and it means we don't check any reserved 
regions and then all rdm policies would be ignored. Guest just works as 
before and the conflict of RDM and guest address space wouldn't be 
handled, and then this may result in the associated device not being 
able to work or even crash the VM. So if you're assigning this kind of 
device, this option is not recommended unless you can make sure any 
conflict doesn't exist.

=item B<reserve="STRING">

Specifies how to deal with conflicts discovered when reserving reserved 
device memory in the guest address space.

When that conflict is unsolved,

"strict" means this VM can't be created successfully, or the associated 
device can't be attached in the case of hotplug;

"relaxed" allows a VM to be created to keep running with a warning 
message thrown out. But this may crash this VM if this device accesses 
RDM. For example, Windows IGD GFX driver always access these regions so 
this lead to a blue screen to crash VM in such a case.


>
>
> (Your use of "random" here is rather information.  You should say
> "arbitrary".)
>

Fixed.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-26  9:36         ` Chen, Tiejun
  2015-06-26 12:06           ` Wei Liu
@ 2015-06-30  3:08           ` Chen, Tiejun
  2015-06-30  8:30             ` Ian Campbell
  1 sibling, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30  3:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

>>
>>>> Having read all these docs I now know what all the options are, but I
>>>> still don't really know what I should write. I think an example or two
>>>> of real world usage would be helpful.
>>>
>>> Here I picked some code fragments to help you understand this,
>>
>> I meant an example or two in the documentation.
>>
>> The code fragment didn't answer my question either, but that's not
>> really the point.
>
> Do you mean I should write two example, respectively?
>
> #1. What is one actual conflict?
> #2. After we're trying to fix this conflict,
> #2.1 How will "strict" fail a VM?
> #2.2 How will "relaxed" impact on a VM?
>

I'm trying to improve this section like this,

Currently there are only two valid types:

"host" means all reserved device memory on this platform should be 
checked to reserve regions in this VM's guest address space. This global 
RDM parameter allows user to specify reserved regions explicitly, and 
using "host" includes all reserved regions reported on this platform, 
which is useful when doing hotplug.

"none" is the default value and it means we don't check any reserved 
regions and then all rdm policies would be ignored. Guest just works as 
before and the conflict of RDM and guest address space wouldn't be 
handled, and then this may result in the associated device not being 
able to work or even crash the VM. So if you're assigning this kind of 
device, this option is not recommended unless you can make sure any 
conflict doesn't exist.

=item B<reserve="STRING">

Specifies how to deal with conflicts discovered when reserving reserved 
device memory in the guest address space.

When that conflict is unsolved,

"strict" means this VM can't be created successfully, or the associated 
device can't be attached in the case of hotplug;

"relaxed" allows a VM to be created to keep running with a warning 
message thrown out. But this may crash this VM if this device accesses 
RDM. For example, Windows IGD GFX driver always access these regions so 
this lead to a blue screen to crash VM in such a case.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-30  3:08           ` Chen, Tiejun
@ 2015-06-30  8:30             ` Ian Campbell
  2015-06-30  9:38               ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Ian Campbell @ 2015-06-30  8:30 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

On Tue, 2015-06-30 at 11:08 +0800, Chen, Tiejun wrote:
> >>
> >>>> Having read all these docs I now know what all the options are, but I
> >>>> still don't really know what I should write. I think an example or two
> >>>> of real world usage would be helpful.
> >>>
> >>> Here I picked some code fragments to help you understand this,
> >>
> >> I meant an example or two in the documentation.
> >>
> >> The code fragment didn't answer my question either, but that's not
> >> really the point.
> >
> > Do you mean I should write two example, respectively?

I meant an example or two of actual concrete uses (ideally the most
common ones) of the rdm parameter and what they mean in practical terms.

[...]
> "none" is the default value and it means we don't check any reserved 
> regions and then all rdm policies would be ignored.


I'm afraid I still don't understand what the difference between
"rdm=none" and simply not providing an rdm argument at all are.

Ian.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-30  8:30             ` Ian Campbell
@ 2015-06-30  9:38               ` Chen, Tiejun
  2015-07-07 11:36                 ` Ian Campbell
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30  9:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

>>>> The code fragment didn't answer my question either, but that's not
>>>> really the point.
>>>
>>> Do you mean I should write two example, respectively?
>
> I meant an example or two of actual concrete uses (ideally the most
> common ones) of the rdm parameter and what they mean in practical terms.

What about this?

For example, you're trying to set

memory = 2800

to allocate memory to one given VM but the platform owns two RDM regions 
like,

RMRR region: base_addr ac6d3000 end_address ac6e6fff
RMRR region: base_addr ad800000 end_address afffffff

In this conflict case,

#1. If the type options is set with "none",

rdm = "type=none,reserve=strict"
or rdm = "type=none,reserve=relaxed"

we don't handle any conflict just to make VM keep running as before. 
Note this is our default behavior.

#2. If the type options is set with "host",

rdm = "type=host,reserve=strict"
or rdm = "type=host,reserve=relaxed"

All conflict would be handled according to our policies which is 
introduced by the reserve option as described below.
...

>
> [...]
>> "none" is the default value and it means we don't check any reserved
>> regions and then all rdm policies would be ignored.
>
>
> I'm afraid I still don't understand what the difference between
> "rdm=none" and simply not providing an rdm argument at all are.
>

Currently they're the same case at this point.

As I said previously, this default option is used to communicate inside 
xl but its still possible to introduce more options in the future, or 
think about if one day we'd like to set "host" as a default option 
internally, we still need this explicit option to help user ignore rdm, 
right? So based on your question I just think at most we can remove this 
option description in doc file right now, so any concern to this?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-24  7:33           ` Jan Beulich
@ 2015-06-30 10:40             ` George Dunlap
  2015-06-30 11:19               ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 10:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Zhang, Tiejun Chen, Kevin Tian, Tim Deegan, xen-devel

On Wed, Jun 24, 2015 at 8:33 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 24.06.15 at 09:26, <tiejun.chen@intel.com> wrote:
>>> This would need to go into patch 2; I wonder whether folding that
>>
>> Yes.
>>
>>> and this one wouldn't be warranted, avoiding the former adding
>>
>> Are you saying to fold patch #2 and patch #3? But shouldn't we always
>> define a new and then use that in practice subsequently? Even with two
>> patches, respectively.
>
> It's a matter of taste to some degree. Unless patches are really
> involved, I prefer them not to add dead code. Apart from
> eliminating the case of the code remaining dead (perhaps for
> extended periods of time) if only parts of a series get applied, it
> also generally helps review if one can see the consumer of a
> newly added function right away.

FWIW I was thinking the same thing as I was looking at these two patches.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-06-30 11:08   ` George Dunlap
  2015-06-30 11:24     ` Chen, Tiejun
  2015-07-01 16:30   ` George Dunlap
  1 sibling, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 11:08 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index bc45ea5..2f9e40e 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -478,6 +478,11 @@ struct xen_domctl_assign_device {
>              XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>          } dt;
>      } u;
> +    /* IN */
> +#define XEN_DOMCTL_DEV_NO_RDM           0
> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
> +    uint32_t  flag;   /* flag of assigned device */

Normally flags would be bit fields, not values like this.

Also, what's the distinction between RDM and RMRR, and is there a good
reason to use the first here rather than the second?

It's also not clear to me what NO_RDM is meant to be for -- is it
meant to be an assertion that the caller expects the device to have no
RMRRs associated with it?

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 03/19] xen/vtd: create RMRR mapping
  2015-06-30 10:40             ` George Dunlap
@ 2015-06-30 11:19               ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30 11:19 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: Yang Zhang, Kevin Tian, Tim Deegan, xen-devel

>> It's a matter of taste to some degree. Unless patches are really
>> involved, I prefer them not to add dead code. Apart from
>> eliminating the case of the code remaining dead (perhaps for
>> extended periods of time) if only parts of a series get applied, it
>> also generally helps review if one can see the consumer of a
>> newly added function right away.
>
> FWIW I was thinking the same thing as I was looking at these two patches.
>

Yes, this is our last solution by default.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-30 11:08   ` George Dunlap
@ 2015-06-30 11:24     ` Chen, Tiejun
  2015-06-30 14:20       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-30 11:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>> +    uint32_t  flag;   /* flag of assigned device */
>
> Normally flags would be bit fields, not values like this.
>
> Also, what's the distinction between RDM and RMRR, and is there a good
> reason to use the first here rather than the second?
>
> It's also not clear to me what NO_RDM is meant to be for -- is it
> meant to be an assertion that the caller expects the device to have no
> RMRRs associated with it?
>

All concerns what you're raising above just make me realized you're 
missing all background info and history changes. So I think if you 
really would like to review this series, at least you should take a look 
at our previous design and some basic change log, which are mentioned 
inside patch #00.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-30 11:24     ` Chen, Tiejun
@ 2015-06-30 14:20       ` George Dunlap
  2015-07-01  1:11         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 14:20 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On 06/30/2015 12:24 PM, Chen, Tiejun wrote:
>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>> +    uint32_t  flag;   /* flag of assigned device */
>>
>> Normally flags would be bit fields, not values like this.
>>
>> Also, what's the distinction between RDM and RMRR, and is there a good
>> reason to use the first here rather than the second?
>>
>> It's also not clear to me what NO_RDM is meant to be for -- is it
>> meant to be an assertion that the caller expects the device to have no
>> RMRRs associated with it?
>>
> 
> All concerns what you're raising above just make me realized you're
> missing all background info and history changes. So I think if you
> really would like to review this series, at least you should take a look
> at our previous design and some basic change log, which are mentioned
> inside patch #00.

I did read #00, but I missed the RDM/RMRR thing.  I still don't see what
NO_RDM is for.

In any case, all the information needed to actually understand the code
needs to be checked into the tree, and patch 00 isn't going to be
checked in.  The choice about naming isn't important, but it should be
possible to look at the patch+changeset and figure out what NO_RDM is
supposed to be doing and why.

And finally, I have now looked through the patch history, and my initial
question was not covered: In the rest of domctl.h, "flags" is a bit
array of boolean values.  Here, at the moment, it is a tristate: 0, 1,
or 2.  There doesn't seem to be a plan for how to add in other flags --
are you going to have an "RDM_MASK" for bits 0-1, so bits 2-31 can be
used for something else?

This isn't super critical, since it is a domctl and we're allowed to
change it; but I think if we're going to be inconsistent we should at
least have consciously decided to do so for a reason.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
                     ` (2 preceding siblings ...)
  2015-06-25 12:31   ` Ian Jackson
@ 2015-06-30 15:54   ` George Dunlap
  2015-07-01  1:16     ` Chen, Tiejun
  3 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 15:54 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> @@ -1450,6 +1458,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
>      }
>
>      for (i = 0; i < d_config->num_pcidevs; i++) {
> +        /*
> +         * If the rdm global policy is 'strict' we should override each device.
> +         */
> +        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_STRICT)
> +            d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;

I think I'm missing something here.

1. By default, the domain policy is RELAXED (See above,
libxl__rdm_setdefault()).

2. By default, the policy for individual devices is STRICT (see
libxl_pci.c:libxl__device_pci_setdefault())

3. If the domain policy is set to STRICT, this overrides per-device policy

4. If the domain policy is set to RELAXED, I don't see that having an
effect on individual devices

If I'm correct, then #3 means it's not possible to have devices for a
domain *default* to strict, but to be relaxed in individual instances.
If you had five devices you wanted strict, and only one device you
wanted to be relaxed (because you knew it didn't matter), you'd have
to set reserved=strict for all the other devices, rather than just
being able to set the domain setting to strict and set reserve=relaxed
for the one.

I think that both violates the principle of least surprise, and is less useful.

Or did I miss something?

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
                     ` (2 preceding siblings ...)
  2015-06-25 12:33   ` Ian Jackson
@ 2015-06-30 15:56   ` George Dunlap
  2015-07-01  1:23     ` Chen, Tiejun
  2015-06-30 16:11   ` George Dunlap
  4 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 15:56 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.
>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> v4:
>
> * Fix one typo, s/unkwon/unknown
> * In command description, we should use "[]" to indicate it's optional
>   for that extended xl command, pci-attach.
>
>  docs/man/xl.pod.1         |  7 ++++++-
>  tools/libxl/libxl_pci.c   | 10 +++++++++-
>  tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
>  tools/libxl/xl_cmdtable.c |  2 +-
>  4 files changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 4eb929d..c5c4809 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
>  usable by Domain 0 again.  If the device is not bound to pciback, it will
>  return success.
>
> -=item B<pci-attach> I<domain-id> I<BDF>
> +=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
>
>  Hot-plug a new pass-through pci device to the specified domain.
>  B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
> +B<rdm policy> is about how to handle conflict between reserving reserved device
> +memory and guest address space. "strict" means an unsolved conflict leads to
> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> +message thrown out. Here "strict" is default.
> +
>
>  =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
>
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index a00d799..a6a2a8c 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>      FILE *f;
>      unsigned long long start, end, flags, size;
>      int irq, i, rc, hvm = 0;
> -    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> +    uint32_t flag;
>
>      if (type == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
> @@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>
>  out:
>      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
> +        } else {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
> +            return ERROR_FAIL;
> +        }

Shouldn't this be in the previous patch?

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
                     ` (3 preceding siblings ...)
  2015-06-30 15:56   ` George Dunlap
@ 2015-06-30 16:11   ` George Dunlap
  2015-07-01  1:30     ` Chen, Tiejun
  4 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 16:11 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.

Actually, it looks like what you need to do here, both for this patch
and the previous one is to add "rdm_reserve" to libxlu_pci.c, so that
it gets handled on a per-device level just like permissive,
msitranslate, &c.  That would make it Just Work for both domain config
and for the pci hotplug (and any other toolstacks using the xlu
functions to parse BDFs).

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters
  2015-06-23  9:57 ` [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters Tiejun Chen
  2015-06-25 11:35   ` Wei Liu
@ 2015-06-30 16:30   ` George Dunlap
  2015-07-01  1:31     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-06-30 16:30 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch parses to enable user configurable parameters to specify
> RDM resource and according policies,
>
> Global RDM parameter:
>     rdm = "type=none/host,reserve=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]x

Oh, right -- I see you did add this here.  In which case I think you
don't need the extra xl parameter you added in patch 12/19, right?  As
I said, that's how we're handling permissive, msi_translate, and the
other per-device flags.

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index c7a12b1..85d74fd 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1923,6 +1923,14 @@ skip_vfb:
>          xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
>      }
>
> +    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
> +        libxl_rdm_reserve rdm;
> +        if (!xlu_rdm_parse(config, &rdm, buf)) {
> +            b_info->rdm.type = rdm.type;
> +            b_info->rdm.reserve = rdm.reserve;
> +        }
> +    }
> +
>      if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
>          d_config->num_pcidevs = 0;
>          d_config->pcidevs = NULL;
> @@ -1937,6 +1945,8 @@ skip_vfb:
>              pcidev->power_mgmt = pci_power_mgmt;
>              pcidev->permissive = pci_permissive;
>              pcidev->seize = pci_seize;
> +            /* We'd like to force reserve rdm specific to a device by default.*/
> +            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;

Won't this mean that even with a domain default policy of "relaxed",
that individual pci devices will still default to "strict"?

It looks to me like your global policy isn't so much "default to
relaxed unless specified strict" vs "default to strict unless
specified to relaxed", but is effectively "allow to be relaxed if
specified" vs "force to be strict no matter what the per-device config
says".  That's much less expected, and I think less useful.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-30 14:20       ` George Dunlap
@ 2015-07-01  1:11         ` Chen, Tiejun
  2015-07-01 10:02           ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01  1:11 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On 2015/6/30 22:20, George Dunlap wrote:
> On 06/30/2015 12:24 PM, Chen, Tiejun wrote:
>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>
>>> Normally flags would be bit fields, not values like this.
>>>
>>> Also, what's the distinction between RDM and RMRR, and is there a good
>>> reason to use the first here rather than the second?
>>>
>>> It's also not clear to me what NO_RDM is meant to be for -- is it
>>> meant to be an assertion that the caller expects the device to have no
>>> RMRRs associated with it?
>>>
>>
>> All concerns what you're raising above just make me realized you're
>> missing all background info and history changes. So I think if you
>> really would like to review this series, at least you should take a look
>> at our previous design and some basic change log, which are mentioned
>> inside patch #00.
>
> I did read #00, but I missed the RDM/RMRR thing.  I still don't see what

Thanks.

> NO_RDM is for.
>
> In any case, all the information needed to actually understand the code
> needs to be checked into the tree, and patch 00 isn't going to be
> checked in.  The choice about naming isn't important, but it should be
> possible to look at the patch+changeset and figure out what NO_RDM is
> supposed to be doing and why.

 From my point of view, "NO" should be clear at certain point, right?

If you want delve into the reason why we called it, you can refer to,

http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01223.html

http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01747.html

http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01793.html

>
> And finally, I have now looked through the patch history, and my initial
> question was not covered: In the rest of domctl.h, "flags" is a bit
> array of boolean values.  Here, at the moment, it is a tristate: 0, 1,

/* XEN_DOMCTL_createdomain */
struct xen_domctl_createdomain {
     /* IN parameters */
     uint32_t ssidref;
     xen_domain_handle_t handle;
  /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
#define _XEN_DOMCTL_CDF_hvm_guest     0
#define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
  /* Use hardware-assisted paging if available? */
#define _XEN_DOMCTL_CDF_hap           1
#define XEN_DOMCTL_CDF_hap            (1U<<_XEN_DOMCTL_CDF_hap)
  /* Should domain memory integrity be verifed by tboot during Sx? */
#define _XEN_DOMCTL_CDF_s3_integrity  2
#define XEN_DOMCTL_CDF_s3_integrity   (1U<<_XEN_DOMCTL_CDF_s3_integrity)
  /* Disable out-of-sync shadow page tables? */
#define _XEN_DOMCTL_CDF_oos_off       3
#define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
  /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
#define _XEN_DOMCTL_CDF_pvh_guest     4
#define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
     uint32_t flags;

> or 2.  There doesn't seem to be a plan for how to add in other flags --
> are you going to have an "RDM_MASK" for bits 0-1, so bits 2-31 can be

Its very possible to introduce some new flags to address some 
complicated cases like live migration, and this is really what we want 
to step next.

> used for something else?

I think already we have a comment here,

uint32_t  flag;   /* flag of assigned device */

This clarifies explicitly this viable is dedicated to be as a flag.

>
> This isn't super critical, since it is a domctl and we're allowed to
> change it; but I think if we're going to be inconsistent we should at
> least have consciously decided to do so for a reason.
>

Just see above, if I'm wrong please correct me.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-30 15:54   ` George Dunlap
@ 2015-07-01  1:16     ` Chen, Tiejun
  2015-07-01 10:07       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01  1:16 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/6/30 23:54, George Dunlap wrote:
> On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>> @@ -1450,6 +1458,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
>>       }
>>
>>       for (i = 0; i < d_config->num_pcidevs; i++) {
>> +        /*
>> +         * If the rdm global policy is 'strict' we should override each device.
>> +         */
>> +        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_STRICT)
>> +            d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
>
> I think I'm missing something here.
>
> 1. By default, the domain policy is RELAXED (See above,
> libxl__rdm_setdefault()).
>
> 2. By default, the policy for individual devices is STRICT (see
> libxl_pci.c:libxl__device_pci_setdefault())
>
> 3. If the domain policy is set to STRICT, this overrides per-device policy
>
> 4. If the domain policy is set to RELAXED, I don't see that having an
> effect on individual devices

This is our rule, and this is why I think you need to take a look at 
patch #00, our design and all patch head descriptions,

"Default per-device RDM policy is 'strict', while default global RDM 
policy is 'relaxed'. When both policies are specified on a given region, 
'strict' is always preferred."

Thanks
Tiejun

>
> If I'm correct, then #3 means it's not possible to have devices for a
> domain *default* to strict, but to be relaxed in individual instances.
> If you had five devices you wanted strict, and only one device you
> wanted to be relaxed (because you knew it didn't matter), you'd have
> to set reserved=strict for all the other devices, rather than just
> being able to set the domain setting to strict and set reserve=relaxed
> for the one.
>
> I think that both violates the principle of least surprise, and is less useful.
>
> Or did I miss something?
>
>   -George
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-30 15:56   ` George Dunlap
@ 2015-07-01  1:23     ` Chen, Tiejun
  2015-07-01 10:22       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01  1:23 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> @@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>>
>>   out:
>>       if (!libxl_is_stubdom(ctx, domid, NULL)) {
>> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
>> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
>> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
>> +        } else {
>> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
>> +            return ERROR_FAIL;
>> +        }
>
> Shouldn't this be in the previous patch?
>

This is trying to covert LIBXL_XXX to XEN_XXX passed this policy as a 
hypercall, so I still think this is better to live here. Instead, the 
previous patch is just defining something.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-06-30 16:11   ` George Dunlap
@ 2015-07-01  1:30     ` Chen, Tiejun
  2015-07-01 10:31       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01  1:30 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/7/1 0:11, George Dunlap wrote:
> On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
>> This patch passes our rdm reservation policy inside libxl
>> when we assign a device or attach a device.
>
> Actually, it looks like what you need to do here, both for this patch
> and the previous one is to add "rdm_reserve" to libxlu_pci.c, so that
> it gets handled on a per-device level just like permissive,
> msitranslate, &c.  That would make it Just Work for both domain config
> and for the pci hotplug (and any other toolstacks using the xlu
> functions to parse BDFs).
>

We'd like to separate this kind of thing into two patches respectively 
to make this patch series bisectable and readable. And actually this 
split is suggested by Wei during our previous review, I think his advice 
should make sense.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters
  2015-06-30 16:30   ` George Dunlap
@ 2015-07-01  1:31     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01  1:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> Global RDM parameter:
>>      rdm = "type=none/host,reserve=strict/relaxed"
>> Per-device RDM parameter:
>>      pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]x
>
> Oh, right -- I see you did add this here.  In which case I think you
> don't need the extra xl parameter you added in patch 12/19, right?  As
> I said, that's how we're handling permissive, msi_translate, and the
> other per-device flags.
>
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index c7a12b1..85d74fd 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -1923,6 +1923,14 @@ skip_vfb:
>>           xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
>>       }
>>
>> +    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
>> +        libxl_rdm_reserve rdm;
>> +        if (!xlu_rdm_parse(config, &rdm, buf)) {
>> +            b_info->rdm.type = rdm.type;
>> +            b_info->rdm.reserve = rdm.reserve;
>> +        }
>> +    }
>> +
>>       if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
>>           d_config->num_pcidevs = 0;
>>           d_config->pcidevs = NULL;
>> @@ -1937,6 +1945,8 @@ skip_vfb:
>>               pcidev->power_mgmt = pci_power_mgmt;
>>               pcidev->permissive = pci_permissive;
>>               pcidev->seize = pci_seize;
>> +            /* We'd like to force reserve rdm specific to a device by default.*/
>> +            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
>
> Won't this mean that even with a domain default policy of "relaxed",
> that individual pci devices will still default to "strict"?

Yes, like I replied to you on anther email.

Thanks
Tiejun

>
> It looks to me like your global policy isn't so much "default to
> relaxed unless specified strict" vs "default to strict unless
> specified to relaxed", but is effectively "allow to be relaxed if
> specified" vs "force to be strict no matter what the per-device config
> says".  That's much less expected, and I think less useful.
>
>   -George
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01  1:11         ` Chen, Tiejun
@ 2015-07-01 10:02           ` George Dunlap
  2015-07-01 10:47             ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 10:02 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On 07/01/2015 02:11 AM, Chen, Tiejun wrote:
> On 2015/6/30 22:20, George Dunlap wrote:
>> On 06/30/2015 12:24 PM, Chen, Tiejun wrote:
>>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>>
>>>> Normally flags would be bit fields, not values like this.
>>>>
>>>> Also, what's the distinction between RDM and RMRR, and is there a good
>>>> reason to use the first here rather than the second?
>>>>
>>>> It's also not clear to me what NO_RDM is meant to be for -- is it
>>>> meant to be an assertion that the caller expects the device to have no
>>>> RMRRs associated with it?
>>>>
>>>
>>> All concerns what you're raising above just make me realized you're
>>> missing all background info and history changes. So I think if you
>>> really would like to review this series, at least you should take a look
>>> at our previous design and some basic change log, which are mentioned
>>> inside patch #00.
>>
>> I did read #00, but I missed the RDM/RMRR thing.  I still don't see what
> 
> Thanks.
> 
>> NO_RDM is for.
>>
>> In any case, all the information needed to actually understand the code
>> needs to be checked into the tree, and patch 00 isn't going to be
>> checked in.  The choice about naming isn't important, but it should be
>> possible to look at the patch+changeset and figure out what NO_RDM is
>> supposed to be doing and why.
> 
> From my point of view, "NO" should be clear at certain point, right?

Well, I'm afraid it's not.

Looking through the entire series, it *appears* that "NO_RDM" is meant
to be passed for architectures like ARM DeviceTree, where it is known
that no RDM regions can exist.

But it might also mean "I expect this device not to have any RDM
regions".  And it's certainly not immediately obvious what the effective
difference would be when I choose it -- what happens if I pass NO_RDM
for PCI systems?  How is it different than passing STRICT?

And in any case, as I said, reviewers and future archaeologists should
be able to tell from the individual patch what is meant, not have to go
through the entire series and guess.

> If you want delve into the reason why we called it, you can refer to,
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01223.html
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01747.html
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01793.html

As I said, all the information needed to understand the patch needs to
be in the changelog.  Those messages will not be in the changelog, so
they are irrelevant to my main complaint.

>> And finally, I have now looked through the patch history, and my initial
>> question was not covered: In the rest of domctl.h, "flags" is a bit
>> array of boolean values.  Here, at the moment, it is a tristate: 0, 1,
> 
> /* XEN_DOMCTL_createdomain */
> struct xen_domctl_createdomain {
>     /* IN parameters */
>     uint32_t ssidref;
>     xen_domain_handle_t handle;
>  /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
> #define _XEN_DOMCTL_CDF_hvm_guest     0
> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>  /* Use hardware-assisted paging if available? */
> #define _XEN_DOMCTL_CDF_hap           1
> #define XEN_DOMCTL_CDF_hap            (1U<<_XEN_DOMCTL_CDF_hap)
>  /* Should domain memory integrity be verifed by tboot during Sx? */
> #define _XEN_DOMCTL_CDF_s3_integrity  2
> #define XEN_DOMCTL_CDF_s3_integrity   (1U<<_XEN_DOMCTL_CDF_s3_integrity)
>  /* Disable out-of-sync shadow page tables? */
> #define _XEN_DOMCTL_CDF_oos_off       3
> #define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
>  /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
> #define _XEN_DOMCTL_CDF_pvh_guest     4
> #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
>     uint32_t flags;

Yes, this demonstrates my point.  Each of these is a single-bit boolean
value that takes up a single bit -- either on or off.  But here you have
three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If
you add more flags like this, then all the code which says "if (flags >
N)" will need to be changed to mask out the higher bits.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01  1:16     ` Chen, Tiejun
@ 2015-07-01 10:07       ` George Dunlap
  2015-07-01 10:26         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 10:07 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/01/2015 02:16 AM, Chen, Tiejun wrote:
> On 2015/6/30 23:54, George Dunlap wrote:
>> On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com>
>> wrote:
>>> @@ -1450,6 +1458,11 @@ static void domcreate_attach_pci(libxl__egc
>>> *egc, libxl__multidev *multidev,
>>>       }
>>>
>>>       for (i = 0; i < d_config->num_pcidevs; i++) {
>>> +        /*
>>> +         * If the rdm global policy is 'strict' we should override
>>> each device.
>>> +         */
>>> +        if (d_config->b_info.rdm.reserve ==
>>> LIBXL_RDM_RESERVE_FLAG_STRICT)
>>> +            d_config->pcidevs[i].rdm_reserve =
>>> LIBXL_RDM_RESERVE_FLAG_STRICT;
>>
>> I think I'm missing something here.
>>
>> 1. By default, the domain policy is RELAXED (See above,
>> libxl__rdm_setdefault()).
>>
>> 2. By default, the policy for individual devices is STRICT (see
>> libxl_pci.c:libxl__device_pci_setdefault())
>>
>> 3. If the domain policy is set to STRICT, this overrides per-device
>> policy
>>
>> 4. If the domain policy is set to RELAXED, I don't see that having an
>> effect on individual devices
> 
> This is our rule, and this is why I think you need to take a look at
> patch #00, our design and all patch head descriptions,
> 
> "Default per-device RDM policy is 'strict', while default global RDM
> policy is 'relaxed'. When both policies are specified on a given region,
> 'strict' is always preferred."

It looks like you didn't finish reading my message.  I suggest you do so:

>> If I'm correct, then #3 means it's not possible to have devices for a
>> domain *default* to strict, but to be relaxed in individual instances.
>> If you had five devices you wanted strict, and only one device you
>> wanted to be relaxed (because you knew it didn't matter), you'd have
>> to set reserved=strict for all the other devices, rather than just
>> being able to set the domain setting to strict and set reserve=relaxed
>> for the one.
>>
>> I think that both violates the principle of least surprise, and is
>> less useful.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-07-01  1:23     ` Chen, Tiejun
@ 2015-07-01 10:22       ` George Dunlap
  2015-07-01 10:56         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 10:22 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/01/2015 02:23 AM, Chen, Tiejun wrote:
>>> @@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t
>>> domid, libxl_device_pci *pcidev, i
>>>
>>>   out:
>>>       if (!libxl_is_stubdom(ctx, domid, NULL)) {
>>> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
>>> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>> +        } else if (pcidev->rdm_reserve ==
>>> LIBXL_RDM_RESERVE_FLAG_STRICT) {
>>> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
>>> +        } else {
>>> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm
>>> check flag.");
>>> +            return ERROR_FAIL;
>>> +        }
>>
>> Shouldn't this be in the previous patch?
>>
> 
> This is trying to covert LIBXL_XXX to XEN_XXX passed this policy as a
> hypercall, so I still think this is better to live here. Instead, the
> previous patch is just defining something.

The entire rest of this patch is about xl.  It doesn't make any sense at
all for the previous patch to modify libxl in a way that doesnt'
actually do anything, and then in the current patch modify both xl and
libxl.

What if, for instance, someone had built their own toolstack on top of
libxl, and wanted to backport just the xen/libxl parts of the RMRR
series?  They'd have to backport this patch with the xl changes to get a
functioning system, even though they aren't really using libxl.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 10:07       ` George Dunlap
@ 2015-07-01 10:26         ` Chen, Tiejun
  2015-07-01 10:57           ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01 10:26 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>>> 1. By default, the domain policy is RELAXED (See above,
>>> libxl__rdm_setdefault()).
>>>
>>> 2. By default, the policy for individual devices is STRICT (see
>>> libxl_pci.c:libxl__device_pci_setdefault())
>>>
>>> 3. If the domain policy is set to STRICT, this overrides per-device
>>> policy
>>>
>>> 4. If the domain policy is set to RELAXED, I don't see that having an
>>> effect on individual devices
>>
>> This is our rule, and this is why I think you need to take a look at
>> patch #00, our design and all patch head descriptions,
>>
>> "Default per-device RDM policy is 'strict', while default global RDM
>> policy is 'relaxed'. When both policies are specified on a given region,
>> 'strict' is always preferred."
>
> It looks like you didn't finish reading my message.  I suggest you do so:

Okay.

>
>>> If I'm correct, then #3 means it's not possible to have devices for a
>>> domain *default* to strict, but to be relaxed in individual instances.
>>> If you had five devices you wanted strict, and only one device you
>>> wanted to be relaxed (because you knew it didn't matter), you'd have
>>> to set reserved=strict for all the other devices, rather than just
>>> being able to set the domain setting to strict and set reserve=relaxed
>>> for the one.
>>>
>>> I think that both violates the principle of least surprise, and is
>>> less useful.
>

So what's you idea to follow our requirement?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-07-01  1:30     ` Chen, Tiejun
@ 2015-07-01 10:31       ` George Dunlap
  2015-07-02  9:27         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 10:31 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/01/2015 02:30 AM, Chen, Tiejun wrote:
> On 2015/7/1 0:11, George Dunlap wrote:
>> On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com>
>> wrote:
>>> This patch passes our rdm reservation policy inside libxl
>>> when we assign a device or attach a device.
>>
>> Actually, it looks like what you need to do here, both for this patch
>> and the previous one is to add "rdm_reserve" to libxlu_pci.c, so that
>> it gets handled on a per-device level just like permissive,
>> msitranslate, &c.  That would make it Just Work for both domain config
>> and for the pci hotplug (and any other toolstacks using the xlu
>> functions to parse BDFs).
>>
> 
> We'd like to separate this kind of thing into two patches respectively
> to make this patch series bisectable and readable. And actually this
> split is suggested by Wei during our previous review, I think his advice
> should make sense.

I'm not only suggesting changing the layout of the patches; I'm
suggesting modifying the functionality.

In patch 12 you add a new command-line parameter to xl; so that you have
to type something like this:

# xl pci-attach ubuntu01 01:00.1,msitranslate=1 relaxed

What I'm saying is that you can drop the xl part of that patch entirely,
because once you have the xlu code in, you can just do this:

# xl pci-attach ubuntu01 01:00.1,msitranslate=1,rdm_reserve=relaxed

This has the positive advantage that you can copy and paste the same
string into both the xl command and the xl config file.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 10:02           ` George Dunlap
@ 2015-07-01 10:47             ` Chen, Tiejun
  2015-07-01 14:39               ` George Dunlap
  2015-07-06 10:34               ` Jan Beulich
  0 siblings, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01 10:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On 2015/7/1 18:02, George Dunlap wrote:
> On 07/01/2015 02:11 AM, Chen, Tiejun wrote:
>> On 2015/6/30 22:20, George Dunlap wrote:
>>> On 06/30/2015 12:24 PM, Chen, Tiejun wrote:
>>>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>>>
>>>>> Normally flags would be bit fields, not values like this.
>>>>>
>>>>> Also, what's the distinction between RDM and RMRR, and is there a good
>>>>> reason to use the first here rather than the second?
>>>>>
>>>>> It's also not clear to me what NO_RDM is meant to be for -- is it
>>>>> meant to be an assertion that the caller expects the device to have no
>>>>> RMRRs associated with it?
>>>>>
>>>>
>>>> All concerns what you're raising above just make me realized you're
>>>> missing all background info and history changes. So I think if you
>>>> really would like to review this series, at least you should take a look
>>>> at our previous design and some basic change log, which are mentioned
>>>> inside patch #00.
>>>
>>> I did read #00, but I missed the RDM/RMRR thing.  I still don't see what
>>
>> Thanks.
>>
>>> NO_RDM is for.
>>>
>>> In any case, all the information needed to actually understand the code
>>> needs to be checked into the tree, and patch 00 isn't going to be
>>> checked in.  The choice about naming isn't important, but it should be
>>> possible to look at the patch+changeset and figure out what NO_RDM is
>>> supposed to be doing and why.
>>
>>  From my point of view, "NO" should be clear at certain point, right?
>
> Well, I'm afraid it's not.
>
> Looking through the entire series, it *appears* that "NO_RDM" is meant
> to be passed for architectures like ARM DeviceTree, where it is known
> that no RDM regions can exist.
>
> But it might also mean "I expect this device not to have any RDM
> regions".  And it's certainly not immediately obvious what the effective
> difference would be when I choose it -- what happens if I pass NO_RDM
> for PCI systems?  How is it different than passing STRICT?

Currently NO_RDM is just used and specific to non-x86 inside Xen, not 
tools. So actually we don't pass this. If someone want to extend this 
usage in the future he really should take into account.

>
> And in any case, as I said, reviewers and future archaeologists should
> be able to tell from the individual patch what is meant, not have to go
> through the entire series and guess.
>
>> If you want delve into the reason why we called it, you can refer to,
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01223.html
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01747.html
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01793.html
>
> As I said, all the information needed to understand the patch needs to
> be in the changelog.  Those messages will not be in the changelog, so
> they are irrelevant to my main complaint.

Sorry I'm missing this changelog.

>
>>> And finally, I have now looked through the patch history, and my initial
>>> question was not covered: In the rest of domctl.h, "flags" is a bit
>>> array of boolean values.  Here, at the moment, it is a tristate: 0, 1,
>>
>> /* XEN_DOMCTL_createdomain */
>> struct xen_domctl_createdomain {
>>      /* IN parameters */
>>      uint32_t ssidref;
>>      xen_domain_handle_t handle;
>>   /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>   /* Use hardware-assisted paging if available? */
>> #define _XEN_DOMCTL_CDF_hap           1
>> #define XEN_DOMCTL_CDF_hap            (1U<<_XEN_DOMCTL_CDF_hap)
>>   /* Should domain memory integrity be verifed by tboot during Sx? */
>> #define _XEN_DOMCTL_CDF_s3_integrity  2
>> #define XEN_DOMCTL_CDF_s3_integrity   (1U<<_XEN_DOMCTL_CDF_s3_integrity)
>>   /* Disable out-of-sync shadow page tables? */
>> #define _XEN_DOMCTL_CDF_oos_off       3
>> #define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
>>   /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
>> #define _XEN_DOMCTL_CDF_pvh_guest     4
>> #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
>>      uint32_t flags;
>
> Yes, this demonstrates my point.  Each of these is a single-bit boolean
> value that takes up a single bit -- either on or off.  But here you have
> three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If

Is this fine to you?

#define _XEN_DOMCTL_DEV_NO_RDM          0
#define XEN_DOMCTL_DEV_NO_RDM           (1U<<_XEN_DOMCTL_DEV_NO_RDM)
#define _XEN_DOMCTL_DEV_RDM_RELAXED     1
#define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
#define _XEN_DOMCTL_DEV_RDM_STRICT      2
#define XEN_DOMCTL_DEV_RDM_STRICT       (1U<<_XEN_DOMCTL_DEV_RDM_STRICT)


> you add more flags like this, then all the code which says "if (flags >
> N)" will need to be changed to mask out the higher bits.
>

http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg02950.html

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-07-01 10:22       ` George Dunlap
@ 2015-07-01 10:56         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01 10:56 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> This is trying to covert LIBXL_XXX to XEN_XXX passed this policy as a
>> hypercall, so I still think this is better to live here. Instead, the
>> previous patch is just defining something.
>
> The entire rest of this patch is about xl.  It doesn't make any sense at
> all for the previous patch to modify libxl in a way that doesnt'
> actually do anything, and then in the current patch modify both xl and
> libxl.
>

Right.

> What if, for instance, someone had built their own toolstack on top of
> libxl, and wanted to backport just the xen/libxl parts of the RMRR
> series?  They'd have to backport this patch with the xl changes to get a
> functioning system, even though they aren't really using libxl.
>

So I will squash this into the previous patch as you suggested here.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 10:26         ` Chen, Tiejun
@ 2015-07-01 10:57           ` George Dunlap
  2015-07-01 11:16             ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 10:57 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/01/2015 11:26 AM, Chen, Tiejun wrote:
>>>> 1. By default, the domain policy is RELAXED (See above,
>>>> libxl__rdm_setdefault()).
>>>>
>>>> 2. By default, the policy for individual devices is STRICT (see
>>>> libxl_pci.c:libxl__device_pci_setdefault())
>>>>
>>>> 3. If the domain policy is set to STRICT, this overrides per-device
>>>> policy
>>>>
>>>> 4. If the domain policy is set to RELAXED, I don't see that having an
>>>> effect on individual devices
>>>
>>> This is our rule, and this is why I think you need to take a look at
>>> patch #00, our design and all patch head descriptions,
>>>
>>> "Default per-device RDM policy is 'strict', while default global RDM
>>> policy is 'relaxed'. When both policies are specified on a given region,
>>> 'strict' is always preferred."
>>
>> It looks like you didn't finish reading my message.  I suggest you do so:
> 
> Okay.
> 
>>
>>>> If I'm correct, then #3 means it's not possible to have devices for a
>>>> domain *default* to strict, but to be relaxed in individual instances.
>>>> If you had five devices you wanted strict, and only one device you
>>>> wanted to be relaxed (because you knew it didn't matter), you'd have
>>>> to set reserved=strict for all the other devices, rather than just
>>>> being able to set the domain setting to strict and set reserve=relaxed
>>>> for the one.
>>>>
>>>> I think that both violates the principle of least surprise, and is
>>>> less useful.
>>
> 
> So what's you idea to follow our requirement?

So consider the following config snippet:

---
rdm="reserve=relaxed"

pci=['01:00.1,msitranslate=1']
----

What should the policy for that device be?

According to your policy document, it seems to me like it should be
"relaxed", since the domain default* is set to "relaxed" and nothing
has been specified for the individual device.  That's what "default"
means.  But as far as I can tell from reading the code, the effective
policy for this one will actually be "strict".  That is not what people
will expect.

(* I say "domain default" rather than "global default" because the
default is defined only on a per-domain basis, not across all domains.
To me a "global default" would be one more level up -- something set in
xl.conf which affects all domains unless it's set in the config file.)

Furthermore, consider the following config snippet:

---
rdm="reserve=strict"

pci=['01:00.1,msitranslate=1,rdm_reserve=relaxed']
----

According to your policy document (and the code, as far as I can tell),
this will come up as "strict", even though the user has specifically
asked for it to be set to "relaxed".

This interface doesn't make any sense to me.  Why, if the "global
default" is set to "relaxed", do individual devices still default to
"strict"?  And why is it useful at the domain level to set a
configuration that can't be overridden on a per-device basis?

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 10:57           ` George Dunlap
@ 2015-07-01 11:16             ` Chen, Tiejun
  2015-07-01 13:29               ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-01 11:16 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/7/1 18:57, George Dunlap wrote:
> On 07/01/2015 11:26 AM, Chen, Tiejun wrote:
>>>>> 1. By default, the domain policy is RELAXED (See above,
>>>>> libxl__rdm_setdefault()).
>>>>>
>>>>> 2. By default, the policy for individual devices is STRICT (see
>>>>> libxl_pci.c:libxl__device_pci_setdefault())
>>>>>
>>>>> 3. If the domain policy is set to STRICT, this overrides per-device
>>>>> policy
>>>>>
>>>>> 4. If the domain policy is set to RELAXED, I don't see that having an
>>>>> effect on individual devices
>>>>
>>>> This is our rule, and this is why I think you need to take a look at
>>>> patch #00, our design and all patch head descriptions,
>>>>
>>>> "Default per-device RDM policy is 'strict', while default global RDM
>>>> policy is 'relaxed'. When both policies are specified on a given region,
>>>> 'strict' is always preferred."
>>>
>>> It looks like you didn't finish reading my message.  I suggest you do so:
>>
>> Okay.
>>
>>>
>>>>> If I'm correct, then #3 means it's not possible to have devices for a
>>>>> domain *default* to strict, but to be relaxed in individual instances.
>>>>> If you had five devices you wanted strict, and only one device you
>>>>> wanted to be relaxed (because you knew it didn't matter), you'd have
>>>>> to set reserved=strict for all the other devices, rather than just
>>>>> being able to set the domain setting to strict and set reserve=relaxed
>>>>> for the one.
>>>>>
>>>>> I think that both violates the principle of least surprise, and is
>>>>> less useful.
>>>
>>
>> So what's you idea to follow our requirement?
>
> So consider the following config snippet:
>
> ---
> rdm="reserve=relaxed"
>
> pci=['01:00.1,msitranslate=1']
> ----
>
> What should the policy for that device be?
>
> According to your policy document, it seems to me like it should be
> "relaxed", since the domain default* is set to "relaxed" and nothing

Why? "strict" should be in this case.

> has been specified for the individual device.  That's what "default"

Shouldn't nothing mean we should take a default value?

+            /* We'd like to force reserve rdm specific to a device by 
default.*/
+            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;

> means.  But as far as I can tell from reading the code, the effective
> policy for this one will actually be "strict".  That is not what people

Right.

> will expect.

Why are you saying this is not our expectation? Just let me pick up that 
description *again*,

"Default per-device RDM policy is 'strict', while default global RDM 
policy is 'relaxed'. When both policies are specified on a given region, 
'strict' is always preferred."

>
> (* I say "domain default" rather than "global default" because the
> default is defined only on a per-domain basis, not across all domains.
> To me a "global default" would be one more level up -- something set in
> xl.conf which affects all domains unless it's set in the config file.)
>
> Furthermore, consider the following config snippet:
>
> ---
> rdm="reserve=strict"
>
> pci=['01:00.1,msitranslate=1,rdm_reserve=relaxed']
> ----
>
> According to your policy document (and the code, as far as I can tell),
> this will come up as "strict", even though the user has specifically
> asked for it to be set to "relaxed".

Again, this is from our design and discussion.

>
> This interface doesn't make any sense to me.  Why, if the "global

If you have any objection to our solution, and if you can't find any 
reasonable answer from our design, just please ping Jan or Kevin because 
I'm really not that person who can address this kind of change at this 
point in this high level.

Thanks
Tiejun

> default" is set to "relaxed", do individual devices still default to
> "strict"?  And why is it useful at the domain level to set a
> configuration that can't be overridden on a per-device basis?
>
>   -George
>
>
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 11:16             ` Chen, Tiejun
@ 2015-07-01 13:29               ` George Dunlap
  2015-07-02  1:11                 ` Chen, Tiejun
  2015-07-06 13:34                 ` Chen, Tiejun
  0 siblings, 2 replies; 114+ messages in thread
From: George Dunlap @ 2015-07-01 13:29 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/01/2015 12:16 PM, Chen, Tiejun wrote:
> On 2015/7/1 18:57, George Dunlap wrote:
>> On 07/01/2015 11:26 AM, Chen, Tiejun wrote:
>>>>>> 1. By default, the domain policy is RELAXED (See above,
>>>>>> libxl__rdm_setdefault()).
>>>>>>
>>>>>> 2. By default, the policy for individual devices is STRICT (see
>>>>>> libxl_pci.c:libxl__device_pci_setdefault())
>>>>>>
>>>>>> 3. If the domain policy is set to STRICT, this overrides per-device
>>>>>> policy
>>>>>>
>>>>>> 4. If the domain policy is set to RELAXED, I don't see that having an
>>>>>> effect on individual devices
>>>>>
>>>>> This is our rule, and this is why I think you need to take a look at
>>>>> patch #00, our design and all patch head descriptions,
>>>>>
>>>>> "Default per-device RDM policy is 'strict', while default global RDM
>>>>> policy is 'relaxed'. When both policies are specified on a given
>>>>> region,
>>>>> 'strict' is always preferred."
>>>>
>>>> It looks like you didn't finish reading my message.  I suggest you
>>>> do so:
>>>
>>> Okay.
>>>
>>>>
>>>>>> If I'm correct, then #3 means it's not possible to have devices for a
>>>>>> domain *default* to strict, but to be relaxed in individual
>>>>>> instances.
>>>>>> If you had five devices you wanted strict, and only one device you
>>>>>> wanted to be relaxed (because you knew it didn't matter), you'd have
>>>>>> to set reserved=strict for all the other devices, rather than just
>>>>>> being able to set the domain setting to strict and set
>>>>>> reserve=relaxed
>>>>>> for the one.
>>>>>>
>>>>>> I think that both violates the principle of least surprise, and is
>>>>>> less useful.
>>>>
>>>
>>> So what's you idea to follow our requirement?
>>
>> So consider the following config snippet:
>>
>> ---
>> rdm="reserve=relaxed"
>>
>> pci=['01:00.1,msitranslate=1']
>> ----
>>
>> What should the policy for that device be?
>>
>> According to your policy document, it seems to me like it should be
>> "relaxed", since the domain default* is set to "relaxed" and nothing
> 
> Why? "strict" should be in this case.

OK, I think I see where the problem is.  I had expected the domain-wide
setting to be a default which was overridden by per-device policies (see
pci_permissive and friends).  So when I saw "global default RDM policy"
confirmation bias caused me to interpret it as what I expected to see --
the domain setting as the default, which the local setting could override.

I see now that in your documentation you consistently talk about two
different policies, each of which have their own defaults, and that the
effective permissions for a device end up being the intersection of the
two (i.e., only relaxed of both are relaxed; strict under all other
circumstances).

> Why are you saying this is not our expectation? Just let me pick up that
> description *again*,
> 
> "Default per-device RDM policy is 'strict', while default global RDM
> policy is 'relaxed'. When both policies are specified on a given region,
> 'strict' is always preferred."

Look, if I haven't understood what you meant by the exact same words the
first 4 times I read it, simply repeating the same exact words is not
going to be helpful.  Ideally you need to try go understand where my
misunderstanding is coming from and explain where I've misunderstood
something; or, at least you need to try to use different words, or
explain how the words you're using apply to the given situation.

>> This interface doesn't make any sense to me.  Why, if the "global
> 
> If you have any objection to our solution, and if you can't find any
> reasonable answer from our design, just please ping Jan or Kevin because
> I'm really not that person who can address this kind of change at this
> point in this high level.

And you have no idea why that design was chosen; you're just doing what
you're told?

I was involved in the design discussion, and from the very beginning I
probably saw your plan but misunderstood it.  I wouldn't be surprised if
some others didn't quite understand what they were agreeing to.

This way of doing things is different than the way we do it with most
other options relating to pci devices (e.g., pci_permissive,
pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
semantic: the domain-wide setting takes effect only if it's not set
locally.  If the syntax looks the same but the semantics is different,
many people will be confused.  If we're going to have the domain-wide
policy override the per-device policy, then the naming should make that
clear; for instance, "override=(strict|relaxed|none)", or
"strict_override=(1|0)".

I don't happen to think these "override" semantics are actually going to
turn out to be that useful; I do think a "default" semantic would be
useful.  But I'd be content if the name of the current setting were
switched to "override" to make the semantics more clear.  We can always
add in "default" at some later point if we really want.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 10:47             ` Chen, Tiejun
@ 2015-07-01 14:39               ` George Dunlap
  2015-07-01 15:06                 ` Julien Grall
  2015-07-02  6:50                 ` Chen, Tiejun
  2015-07-06 10:34               ` Jan Beulich
  1 sibling, 2 replies; 114+ messages in thread
From: George Dunlap @ 2015-07-01 14:39 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Julien Grall, Tim Deegan, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Yang Zhang, Stefano Stabellini, Ian Campbell

On 07/01/2015 11:47 AM, Chen, Tiejun wrote:
>>>  From my point of view, "NO" should be clear at certain point, right?
>>
>> Well, I'm afraid it's not.
>>
>> Looking through the entire series, it *appears* that "NO_RDM" is meant
>> to be passed for architectures like ARM DeviceTree, where it is known
>> that no RDM regions can exist.
>>
>> But it might also mean "I expect this device not to have any RDM
>> regions".  And it's certainly not immediately obvious what the effective
>> difference would be when I choose it -- what happens if I pass NO_RDM
>> for PCI systems?  How is it different than passing STRICT?
> 
> Currently NO_RDM is just used and specific to non-x86 inside Xen, not
> tools. So actually we don't pass this. If someone want to extend this
> usage in the future he really should take into account.

When you say "not tools", I take it to mean that you're not exposing
that option through the libxl interface?

tools/libxc/xc_domain.c:xc_assign_dt_device() most certainly does pass
it in, and that's the level I'm talking about.  Someone reviewing this
patch series needs to know, when xc or libxl set NO_RDM, what will be
the effect?  The fact that libxc *shouldn't* set NO_RDM for PCI devices
doesn't mean it won't happen.

Now looking at the end of the series and grepping for
"XEN_DOMCTL_DEV.*RDM", these values are *read and acted on* in exactly
two places:

xen/arch/x86/mm/p2m.c: The whole point of this series; if the p2m is
occupied already, and flag == RDM_STRICT, return an error; otherwise
ignore it.

xen/drivers/passthrough/device_tree.c: If flag != NO_RDM, return an error.

So the meaning of the flags is:
For pci devices:
 - RDM_RELAXED, NO_RDM: ignore conflicts in set_identity_p2m_entry()
 - RDM_STRICT: error on conflicts in set_identity_p2m_entry()
for dt devices:
 - Error if not NO_RDM

It doesn't look to me like the NO_RDM setting actually adds any semantic
meaning.

What I see in the list of references you gave is a request from the list
below is Julien saying this:

"I would also add a XEN_DOMCTL_DEV_NO_RDM that would be use for non-PCI
assignment."

It looks a bit like what you did is said, "Well Julien asked for a
NO_RDM setting, so here's a NO_RDM setting."  Which while perhaps
understandable, doesn't make the value have any more usefulness.

It seems to me that the real problem is that you had two values to begin
with, rather than actually having flags (as the name would imply).

This what I would suggest.  Make a single flag:

#define _XEN_DOMCTL_DEV_RDM_RELAXED     0
#define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)

Then make the meaning of the flags as follows:
* for pci devices:
 - RDM_RELAXED flag SET: ignore conflicts in set_identity_p2m_entry()
 - RDM_RELAXED flag CLEAR: error on conflicts in set_identity_p2m_entry()
* for dt devices:
 - Ignore this flag entirely

If Julien really wants we could error on RDM_RELAXED being set; but I
don't think that's necessary.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 14:39               ` George Dunlap
@ 2015-07-01 15:06                 ` Julien Grall
  2015-07-02  6:50                 ` Chen, Tiejun
  1 sibling, 0 replies; 114+ messages in thread
From: Julien Grall @ 2015-07-01 15:06 UTC (permalink / raw)
  To: George Dunlap, Chen, Tiejun
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper,
	Julien Grall, Tim Deegan, xen-devel, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Yang Zhang, Stefano Stabellini,
	Ian Campbell

Hi,

On 01/07/15 15:39, George Dunlap wrote:
> Then make the meaning of the flags as follows:
> * for pci devices:
>  - RDM_RELAXED flag SET: ignore conflicts in set_identity_p2m_entry()
>  - RDM_RELAXED flag CLEAR: error on conflicts in set_identity_p2m_entry()
> * for dt devices:
>  - Ignore this flag entirely
> 
> If Julien really wants we could error on RDM_RELAXED being set; but I
> don't think that's necessary.

I was confused when I suggested this. After speaking with George IRL,
I'm fine with his solution.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-23  9:57 ` [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
  2015-06-25  9:59   ` Tim Deegan
@ 2015-07-01 15:43   ` George Dunlap
  1 sibling, 0 replies; 114+ messages in thread
From: George Dunlap @ 2015-07-01 15:43 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Tim Deegan, Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> We will create this sort of identity mapping as follows:
>
> If the gfn space is unoccupied, we just set the mapping. If space
> is already occupied by desired identity mapping, do nothing.
> Otherwise, failure is returned.
>
> And we also add a returning value to guest_physmap_remove_page()
> then make that as a better helper to clear such a p2m entry.
>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

FWIW:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-23  9:57 ` [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
  2015-06-30 11:08   ` George Dunlap
@ 2015-07-01 16:30   ` George Dunlap
  2015-07-02  8:49     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-01 16:30 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.
>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

OK, now that I've had a chance to go through the entire series, one
more comment on this patch:

> * Add code comments to describer why we fix to set a policy flag in some
>   cases like adding a device to hwdomain, and removing a device from user domain.

[snip]

> @@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
> +            /*
> +             * RMRR is always reserved on e820 so either of flag
> +             * is fine for hardware domain and here we'd like to
> +             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
> +             */
> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);

So two things.

First, you assert that the value here won't matter, because the
hardware domain is guaranteed never to have a conflict.

Which is likely to be true almost all the time; but the question is,
*if* something goes wrong, what should happen?

For instance, suppose that someone accidentally introduces a bug in
Xen that messes up or ignores reading a portion of the e820 map under
certain circumstances.  What should happen?

If you set this to RELAXED, this clash will be silently ignored; which
means that devices that need RMRR will simply malfunction in weird
ways without any warning messages having been printed that might give
someone a hint about what is going on.

If you set this to STRICT, then this clash will print an error
message, but as far as I can tell, the rest of the device assignment
will continue as normal.  (Please correct me if I've followed the code
wrong.)

Since the device should be just as functional (or not functional)
either way, but in the STRICT case should actually print an error
message which someone might notice, it seems to me that STRICT is a
better option for the hardware domain.

Secondly, you assert in response to Kevin's question in v3 that this
path is only reachable when assigning to the hardware domain.  I think
you at least need to update the comment here to indicate that's what
you think; it's not at all obvious just from looking at the function
that this is true.  And if we do end up doing something besides
STRICT, we should check to make sure that pdev->domain really *is* the
hardware domain before acting like it is.

>              if ( ret )
>                  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
>                          pdev->domain->domain_id);
> @@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
>               PCI_DEVFN2(bdf) != devfn )
>              continue;
>
> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>      }

Same here wrt STRICT.

After those changes (a single RDM_RELAXED flag, passing STRICT in for
the hardware domain) then I think this patch is in good shape.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm
  2015-06-23  9:57 ` [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-07-01 16:32   ` George Dunlap
  0 siblings, 0 replies; 114+ messages in thread
From: George Dunlap @ 2015-07-01 16:32 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Tue, Jun 23, 2015 at 10:57 AM, Tiejun Chen <tiejun.chen@intel.com> wrote:
> This patch enables XENMEM_memory_map in hvm. So hvmloader can
> use it to setup the e820 mappings.
>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Tim Deegan <tim@xen.org>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

FWIW:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 13:29               ` George Dunlap
@ 2015-07-02  1:11                 ` Chen, Tiejun
  2015-07-02  4:47                   ` Chen, Tiejun
  2015-07-02  9:22                   ` George Dunlap
  2015-07-06 13:34                 ` Chen, Tiejun
  1 sibling, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02  1:11 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>>>>>>> If I'm correct, then #3 means it's not possible to have devices for a
>>>>>>> domain *default* to strict, but to be relaxed in individual
>>>>>>> instances.
>>>>>>> If you had five devices you wanted strict, and only one device you
>>>>>>> wanted to be relaxed (because you knew it didn't matter), you'd have
>>>>>>> to set reserved=strict for all the other devices, rather than just
>>>>>>> being able to set the domain setting to strict and set
>>>>>>> reserve=relaxed
>>>>>>> for the one.
>>>>>>>
>>>>>>> I think that both violates the principle of least surprise, and is
>>>>>>> less useful.
>>>>>
>>>>
>>>> So what's you idea to follow our requirement?
>>>
>>> So consider the following config snippet:
>>>
>>> ---
>>> rdm="reserve=relaxed"
>>>
>>> pci=['01:00.1,msitranslate=1']
>>> ----
>>>
>>> What should the policy for that device be?
>>>
>>> According to your policy document, it seems to me like it should be
>>> "relaxed", since the domain default* is set to "relaxed" and nothing
>>
>> Why? "strict" should be in this case.
>
> OK, I think I see where the problem is.  I had expected the domain-wide
> setting to be a default which was overridden by per-device policies (see
> pci_permissive and friends).  So when I saw "global default RDM policy"

We knew this behavior but we'd like to take a different consideration in 
this case.

> confirmation bias caused me to interpret it as what I expected to see --
> the domain setting as the default, which the local setting could override.
>
> I see now that in your documentation you consistently talk about two
> different policies, each of which have their own defaults, and that the
> effective permissions for a device end up being the intersection of the
> two (i.e., only relaxed of both are relaxed; strict under all other
> circumstances).
>
>> Why are you saying this is not our expectation? Just let me pick up that
>> description *again*,
>>
>> "Default per-device RDM policy is 'strict', while default global RDM
>> policy is 'relaxed'. When both policies are specified on a given region,
>> 'strict' is always preferred."
>
> Look, if I haven't understood what you meant by the exact same words the
> first 4 times I read it, simply repeating the same exact words is not
> going to be helpful.  Ideally you need to try go understand where my
> misunderstanding is coming from and explain where I've misunderstood
> something; or, at least you need to try to use different words, or
> explain how the words you're using apply to the given situation.

 From my point of view, I already replied this previously by quoting 
part of the patch head description. As you know this revision is already 
marked as v4 and although I admit some code implementations still need a 
further review, at least our policy should already acknowledged right 
now unless this is really wrong. But in our case, looks you're 
concerning our mechanism is not expected to you. So

>
>>> This interface doesn't make any sense to me.  Why, if the "global
>>
>> If you have any objection to our solution, and if you can't find any
>> reasonable answer from our design, just please ping Jan or Kevin because

just do it to make this clear to us. And then, whatever, I'm going be 
fine to step next.

>> I'm really not that person who can address this kind of change at this
>> point in this high level.
>
> And you have no idea why that design was chosen; you're just doing what

Certainly I have my own understanding with this issue. But

> you're told?

in high level I have to say Yes. If you really read that v2 design and 
its associated discussion, you should notice I didn't put any response 
right there.

>
> I was involved in the design discussion, and from the very beginning I
> probably saw your plan but misunderstood it.  I wouldn't be surprised if
> some others didn't quite understand what they were agreeing to.

Again, I didn't walk into v2 design. So here I don't want to bring any 
confusion to you just with my reply.

>
> This way of doing things is different than the way we do it with most
> other options relating to pci devices (e.g., pci_permissive,
> pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
> semantic: the domain-wide setting takes effect only if it's not set
> locally.  If the syntax looks the same but the semantics is different,
> many people will be confused.  If we're going to have the domain-wide
> policy override the per-device policy, then the naming should make that
> clear; for instance, "override=(strict|relaxed|none)", or
> "strict_override=(1|0)".
>
> I don't happen to think these "override" semantics are actually going to
> turn out to be that useful; I do think a "default" semantic would be
> useful.  But I'd be content if the name of the current setting were
> switched to "override" to make the semantics more clear.  We can always
> add in "default" at some later point if we really want.
>

Just as I said you'd better ping Jan or Kevin to make a point.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-02  1:11                 ` Chen, Tiejun
@ 2015-07-02  4:47                   ` Chen, Tiejun
  2015-07-02  9:22                   ` George Dunlap
  1 sibling, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02  4:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>> I don't happen to think these "override" semantics are actually going to
>> turn out to be that useful; I do think a "default" semantic would be
>> useful.  But I'd be content if the name of the current setting were
>> switched to "override" to make the semantics more clear.  We can always
>> add in "default" at some later point if we really want.
>>
>
> Just as I said you'd better ping Jan or Kevin to make a point.
>

I just have a talk with Kevin, and he think this is fine to him but I'm 
still not sure what's Jan's idea.

Anyway, this shouldn't a big deal to change code so just let me follow 
yours right now.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 14:39               ` George Dunlap
  2015-07-01 15:06                 ` Julien Grall
@ 2015-07-02  6:50                 ` Chen, Tiejun
  2015-07-06 14:55                   ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02  6:50 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Julien Grall, Tim Deegan, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Yang Zhang, Stefano Stabellini, Ian Campbell

> When you say "not tools", I take it to mean that you're not exposing
> that option through the libxl interface?

Yes.

>
> tools/libxc/xc_domain.c:xc_assign_dt_device() most certainly does pass
> it in, and that's the level I'm talking about.  Someone reviewing this
> patch series needs to know, when xc or libxl set NO_RDM, what will be
> the effect?  The fact that libxc *shouldn't* set NO_RDM for PCI devices
> doesn't mean it won't happen.
>
> Now looking at the end of the series and grepping for
> "XEN_DOMCTL_DEV.*RDM", these values are *read and acted on* in exactly
> two places:
>
> xen/arch/x86/mm/p2m.c: The whole point of this series; if the p2m is
> occupied already, and flag == RDM_STRICT, return an error; otherwise
> ignore it.
>
> xen/drivers/passthrough/device_tree.c: If flag != NO_RDM, return an error.
>
> So the meaning of the flags is:
> For pci devices:
>   - RDM_RELAXED, NO_RDM: ignore conflicts in set_identity_p2m_entry()
>   - RDM_STRICT: error on conflicts in set_identity_p2m_entry()
> for dt devices:
>   - Error if not NO_RDM

Correct.

>
> It doesn't look to me like the NO_RDM setting actually adds any semantic
> meaning.
>
> What I see in the list of references you gave is a request from the list
> below is Julien saying this:
>
> "I would also add a XEN_DOMCTL_DEV_NO_RDM that would be use for non-PCI
> assignment."
>
> It looks a bit like what you did is said, "Well Julien asked for a
> NO_RDM setting, so here's a NO_RDM setting."  Which while perhaps
> understandable, doesn't make the value have any more usefulness.
>
> It seems to me that the real problem is that you had two values to begin
> with, rather than actually having flags (as the name would imply).
>
> This what I would suggest.  Make a single flag:
>
> #define _XEN_DOMCTL_DEV_RDM_RELAXED     0
> #define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>
> Then make the meaning of the flags as follows:
> * for pci devices:
>   - RDM_RELAXED flag SET: ignore conflicts in set_identity_p2m_entry()
>   - RDM_RELAXED flag CLEAR: error on conflicts in set_identity_p2m_entry()

No problem.

> * for dt devices:
>   - Ignore this flag entirely

But we still a flag to assign_device() like this,

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..a182487 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
              goto fail;
      }

-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
+                                         XEN_DOMCTL_DEV_RDM_RELAXED);

      if ( rc )
          goto fail;

Or rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0)?

Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 16:30   ` George Dunlap
@ 2015-07-02  8:49     ` Chen, Tiejun
  2015-07-06 14:52       ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02  8:49 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Tim Deegan, xen-devel, Aravind Gopalakrishnan, Jan Beulich,
	Yang Zhang, Stefano Stabellini, Ian Campbell

>> @@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>>                PCI_BUS(bdf) == pdev->bus &&
>>                PCI_DEVFN2(bdf) == devfn )
>>           {
>> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>> +            /*
>> +             * RMRR is always reserved on e820 so either of flag
>> +             * is fine for hardware domain and here we'd like to
>> +             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
>> +             */
>> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);
>
> So two things.
>
> First, you assert that the value here won't matter, because the
> hardware domain is guaranteed never to have a conflict.
>
> Which is likely to be true almost all the time; but the question is,
> *if* something goes wrong, what should happen?
>
> For instance, suppose that someone accidentally introduces a bug in
> Xen that messes up or ignores reading a portion of the e820 map under
> certain circumstances.  What should happen?

Yes, you can image all possible cases. But if this kind of bug can come 
true, I really very doubt if Xen can boot successfully. Because e820 is 
a fundamental key to run OS, so this case is very easy to panic Xen, right?

Anyway, I agree we should concern all corner cases.

>
> If you set this to RELAXED, this clash will be silently ignored; which
> means that devices that need RMRR will simply malfunction in weird
> ways without any warning messages having been printed that might give

No. We always post that messages regardless of relaxe or strict since 
this massage just depends on one condition of that conflict exist.

> someone a hint about what is going on.
>
> If you set this to STRICT, then this clash will print an error
> message, but as far as I can tell, the rest of the device assignment
> will continue as normal.  (Please correct me if I've followed the code
> wrong.)

Not all cases are like this behavior but here is true.

>
> Since the device should be just as functional (or not functional)
> either way, but in the STRICT case should actually print an error
> message which someone might notice, it seems to me that STRICT is a
> better option for the hardware domain.
>

Just see above.

> Secondly, you assert in response to Kevin's question in v3 that this
> path is only reachable when assigning to the hardware domain.  I think
> you at least need to update the comment here to indicate that's what
> you think; it's not at all obvious just from looking at the function

What about this?

               PCI_DEVFN2(bdf) == devfn )
          {
              /*
-             * RMRR is always reserved on e820 so either of flag
-             * is fine for hardware domain and here we'd like to
-             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+             * Here means we're add a device to the hardware domain
+             * so actually RMRR is always reserved on e820 so either
+             * of flag is fine for hardware domain and here we'd like
+             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
               */
              ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
                                          XEN_DOMCTL_DEV_RDM_RELAXED);


> that this is true.  And if we do end up doing something besides
> STRICT, we should check to make sure that pdev->domain really *is* the
> hardware domain before acting like it is.
>
>>               if ( ret )
>>                   dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
>>                           pdev->domain->domain_id);
>> @@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
>>                PCI_DEVFN2(bdf) != devfn )
>>               continue;
>>
>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>       }
>
> Same here wrt STRICT.

This is inside intel_iommu_remove_device() so actually any flag doesn't 
take effect to rmrr_identity_mapping(). But I should add a comment like 
this,

+        /*
+         * Any flag is nothing to clear these mappings so here
+         * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
+         */


>
> After those changes (a single RDM_RELAXED flag, passing STRICT in for
> the hardware domain) then I think this patch is in good shape.
>

Based on my understanding to your concern, seems you always think in 
case of "relax" we don't post any message, right? But now as I reply 
above this is not correct so what's your further consideration?

Anyway, I'm fine to change this. And after you suggested to keep one bit 
just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual 
XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0 
easily.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-02  1:11                 ` Chen, Tiejun
  2015-07-02  4:47                   ` Chen, Tiejun
@ 2015-07-02  9:22                   ` George Dunlap
  2015-07-02 10:01                     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-02  9:22 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/02/2015 02:11 AM, Chen, Tiejun wrote:
>>>>>>>> If I'm correct, then #3 means it's not possible to have devices
>>>>>>>> for a
>>>>>>>> domain *default* to strict, but to be relaxed in individual
>>>>>>>> instances.
>>>>>>>> If you had five devices you wanted strict, and only one device you
>>>>>>>> wanted to be relaxed (because you knew it didn't matter), you'd
>>>>>>>> have
>>>>>>>> to set reserved=strict for all the other devices, rather than just
>>>>>>>> being able to set the domain setting to strict and set
>>>>>>>> reserve=relaxed
>>>>>>>> for the one.
>>>>>>>>
>>>>>>>> I think that both violates the principle of least surprise, and is
>>>>>>>> less useful.
>>>>>>
>>>>>
>>>>> So what's you idea to follow our requirement?
>>>>
>>>> So consider the following config snippet:
>>>>
>>>> ---
>>>> rdm="reserve=relaxed"
>>>>
>>>> pci=['01:00.1,msitranslate=1']
>>>> ----
>>>>
>>>> What should the policy for that device be?
>>>>
>>>> According to your policy document, it seems to me like it should be
>>>> "relaxed", since the domain default* is set to "relaxed" and nothing
>>>
>>> Why? "strict" should be in this case.
>>
>> OK, I think I see where the problem is.  I had expected the domain-wide
>> setting to be a default which was overridden by per-device policies (see
>> pci_permissive and friends).  So when I saw "global default RDM policy"
> 
> We knew this behavior but we'd like to take a different consideration in
> this case.
> 
>> confirmation bias caused me to interpret it as what I expected to see --
>> the domain setting as the default, which the local setting could
>> override.
>>
>> I see now that in your documentation you consistently talk about two
>> different policies, each of which have their own defaults, and that the
>> effective permissions for a device end up being the intersection of the
>> two (i.e., only relaxed of both are relaxed; strict under all other
>> circumstances).
>>
>>> Why are you saying this is not our expectation? Just let me pick up that
>>> description *again*,
>>>
>>> "Default per-device RDM policy is 'strict', while default global RDM
>>> policy is 'relaxed'. When both policies are specified on a given region,
>>> 'strict' is always preferred."
>>
>> Look, if I haven't understood what you meant by the exact same words the
>> first 4 times I read it, simply repeating the same exact words is not
>> going to be helpful.  Ideally you need to try go understand where my
>> misunderstanding is coming from and explain where I've misunderstood
>> something; or, at least you need to try to use different words, or
>> explain how the words you're using apply to the given situation.
> 
> From my point of view, I already replied this previously by quoting part
> of the patch head description. As you know this revision is already
> marked as v4 and although I admit some code implementations still need a
> further review, at least our policy should already acknowledged right
> now unless this is really wrong. But in our case, looks you're
> concerning our mechanism is not expected to you. So
> 
>>
>>>> This interface doesn't make any sense to me.  Why, if the "global
>>>
>>> If you have any objection to our solution, and if you can't find any
>>> reasonable answer from our design, just please ping Jan or Kevin because
> 
> just do it to make this clear to us. And then, whatever, I'm going be
> fine to step next.
> 
>>> I'm really not that person who can address this kind of change at this
>>> point in this high level.
>>
>> And you have no idea why that design was chosen; you're just doing what
> 
> Certainly I have my own understanding with this issue. But
> 
>> you're told?
> 
> in high level I have to say Yes. If you really read that v2 design and
> its associated discussion, you should notice I didn't put any response
> right there.

Look, I'm getting a bit angry at your continual implication that I
haven't put in enough work reading the background for this series.  If
you go back and look at the v2 design discussion, you'll see that I was
actively involved in that discussion, and sent at least a dozen emails
about it.  I have now spent nearly two full days just on this series,
including going back over lots of conversations that have happened
before to find answers to questions which you could have given in a
single line; and also to check assertions that you've made which have
turned out to be false.

In the v2 design discussion, the only thing I could find regarding the
relationship between per-device settings and the domain-wide setting was
as where you said [1]:

"per-device override is always favored if a conflicting setting in
rmrr_host."

And in v2, Wei asked you [2]:

"But this only works with global configuration and individual
configuration in PCI spec trumps this, right?"

And you responded [3]:

"You're right."

Now it happens that in all those cases you were literally talking about
the rmrr_host part of the configuration, not the strict/relaxed part of
the configuration; but that doesn't even make sense, since there *is* no
device-specific rmrr_host setting -- the only configuration which has
both a domain-wide and per-device component is the relaxed/strict.

So:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we should have the global policy override
the local policy

2. The only discussion I *did* find has *you yourself* saying that the
per-device setting should override the global setting, not once, but
twice; and nobody contradicting you.

Maybe there is somewhere else a discussion somewhere where this was
changed; but I've already spent half an hour this morning looking at
where you said it was (v2 design discussion), and found the opposite --
just as I remembered.  I'm not going to look anymore.

You have now caused me to waste an awful lot of time on this series that
could profitably have been used elsewhere.

[1]
marc.info/?i=<AADFC41AFE54684AB9EE6CBC0274A5D126147864@SHSMSX101.ccr.corp.intel.com>

[2] marc.info/?i=<20150519110041.GB21998@zion.uk.xensource.com>

[3] marc.info/?i=<555C1B5C.7070401@intel.com>


>> I was involved in the design discussion, and from the very beginning I
>> probably saw your plan but misunderstood it.  I wouldn't be surprised if
>> some others didn't quite understand what they were agreeing to.
> 
> Again, I didn't walk into v2 design. So here I don't want to bring any
> confusion to you just with my reply.

This is your feature, so it is your responsibility to understand and
explain why you are doing what you are doing, if only to say "Jan wanted
X to happen because of Y [see $ref]."

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
  2015-07-01 10:31       ` George Dunlap
@ 2015-07-02  9:27         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02  9:27 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

> I'm not only suggesting changing the layout of the patches; I'm

Sorry for this misunderstanding.

> suggesting modifying the functionality.
>
> In patch 12 you add a new command-line parameter to xl; so that you have
> to type something like this:
>
> # xl pci-attach ubuntu01 01:00.1,msitranslate=1 relaxed
>
> What I'm saying is that you can drop the xl part of that patch entirely,
> because once you have the xlu code in, you can just do this:
>
> # xl pci-attach ubuntu01 01:00.1,msitranslate=1,rdm_reserve=relaxed
>
> This has the positive advantage that you can copy and paste the same
> string into both the xl command and the xl config file.
>

I think you're right,

pciattach()
     |
     + xlu_pci_parse_bdf()

So I really should drop this patch as you said.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-02  9:22                   ` George Dunlap
@ 2015-07-02 10:01                     ` Chen, Tiejun
  2015-07-02 10:28                       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02 10:01 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

>> in high level I have to say Yes. If you really read that v2 design and
>> its associated discussion, you should notice I didn't put any response
>> right there.
>
> Look, I'm getting a bit angry at your continual implication that I

Sorry to this.

> haven't put in enough work reading the background for this series.  If
> you go back and look at the v2 design discussion, you'll see that I was
> actively involved in that discussion, and sent at least a dozen emails
> about it.  I have now spent nearly two full days just on this series,

Sure and thanks for your review and time.

> including going back over lots of conversations that have happened
> before to find answers to questions which you could have given in a
> single line; and also to check assertions that you've made which have
> turned out to be false.
>
> In the v2 design discussion, the only thing I could find regarding the
> relationship between per-device settings and the domain-wide setting was
> as where you said [1]:
>
> "per-device override is always favored if a conflicting setting in
> rmrr_host."
>
> And in v2, Wei asked you [2]:
>
> "But this only works with global configuration and individual
> configuration in PCI spec trumps this, right?"
>
> And you responded [3]:
>
> "You're right."
>
> Now it happens that in all those cases you were literally talking about
> the rmrr_host part of the configuration, not the strict/relaxed part of
> the configuration; but that doesn't even make sense, since there *is* no
> device-specific rmrr_host setting -- the only configuration which has
> both a domain-wide and per-device component is the relaxed/strict.
>
> So:
>
> 1. After spending yet another half hour doing research, I haven't found
> any discussion that concluded we should have the global policy override
> the local policy

I also took some time to go back checking this point and indeed this is 
not in that public design. And as I mentioned in another email which is 
following this, I also had a talk to Kevin about this issue, and looks 
this is just concluded from our internal discussion and he didn't post 
this in v2 design again because as you know, that design is about 
something in high level. And as I recall, these discussions can't cover 
everything at that moment because they thought we'd better post a 
preliminary patches to further discuss something since this is really a 
complicated case. So afterwards I sent out two RFC revisions to help all 
guys finalize a good solution. And I can confirm current policy is 
always same from the first RFC, but we didn't see any opposite advice 
until now.

>
> 2. The only discussion I *did* find has *you yourself* saying that the
> per-device setting should override the global setting, not once, but
> twice; and nobody contradicting you.
>
> Maybe there is somewhere else a discussion somewhere where this was
> changed; but I've already spent half an hour this morning looking at
> where you said it was (v2 design discussion), and found the opposite --
> just as I remembered.  I'm not going to look anymore.
>
> You have now caused me to waste an awful lot of time on this series that
> could profitably have been used elsewhere.

Sorry to this but I just think we already have 2 RFC revisions and 4 
revisions without RFC, and some patches are already Acked, we really 
should overturn this policy right now?

>
> [1]
> marc.info/?i=<AADFC41AFE54684AB9EE6CBC0274A5D126147864@SHSMSX101.ccr.corp.intel.com>
>
> [2] marc.info/?i=<20150519110041.GB21998@zion.uk.xensource.com>
>
> [3] marc.info/?i=<555C1B5C.7070401@intel.com>
>
>
>>> I was involved in the design discussion, and from the very beginning I
>>> probably saw your plan but misunderstood it.  I wouldn't be surprised if
>>> some others didn't quite understand what they were agreeing to.
>>
>> Again, I didn't walk into v2 design. So here I don't want to bring any
>> confusion to you just with my reply.
>
> This is your feature, so it is your responsibility to understand and
> explain why you are doing what you are doing, if only to say "Jan wanted

Maybe you remember I just posted v1 but looks that was not a better 
design to show this implementation according to some feedback, so Kevin 
issued v2 revision and had a wider discussion with you guys. Since then 
I just follow this version. So I mean I don't further hold these things 
in high level since I just think both policy is fine to me because IMO, 
these two approaches are optional.

> X to happen because of Y [see $ref]."
>

So this is why I said you'd better ask this to Kevin or Jan since I 
can't decide what's next at this point.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-02 10:01                     ` Chen, Tiejun
@ 2015-07-02 10:28                       ` George Dunlap
  2015-07-02 11:32                         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-02 10:28 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 07/02/2015 11:01 AM, Chen, Tiejun wrote:
>> 1. After spending yet another half hour doing research, I haven't found
>> any discussion that concluded we should have the global policy override
>> the local policy
> 
> I also took some time to go back checking this point and indeed this is
> not in that public design. And as I mentioned in another email which is
> following this, I also had a talk to Kevin about this issue, and looks
> this is just concluded from our internal discussion and he didn't post
> this in v2 design again because as you know, that design is about
> something in high level. And as I recall, these discussions can't cover
> everything at that moment because they thought we'd better post a
> preliminary patches to further discuss something since this is really a
> complicated case. So afterwards I sent out two RFC revisions to help all
> guys finalize a good solution. And I can confirm current policy is
> always same from the first RFC, but we didn't see any opposite advice
> until now.

Probably because the reviewers all assumed that the design draft had
been followed, and you didn't make it clear that you'd changed it.

>> 2. The only discussion I *did* find has *you yourself* saying that the
>> per-device setting should override the global setting, not once, but
>> twice; and nobody contradicting you.
>>
>> Maybe there is somewhere else a discussion somewhere where this was
>> changed; but I've already spent half an hour this morning looking at
>> where you said it was (v2 design discussion), and found the opposite --
>> just as I remembered.  I'm not going to look anymore.
>>
>> You have now caused me to waste an awful lot of time on this series that
>> could profitably have been used elsewhere.
> 
> Sorry to this but I just think we already have 2 RFC revisions and 4
> revisions without RFC, and some patches are already Acked, we really
> should overturn this policy right now?

First of all, I think it's easy to change.

Even if it weren't, I already said that I'd be OK with accepting the
patch series with the existing "override" semantics, and without the
"default" semantics, *if* it were renamed to make it clear what was
going on.

But, for future reference, I am not going to approve an interface I
think is misleading or wrong -- particularly one like the xl interface
which we want to avoid changing if possible -- just because time is
short.  One of my own features, HVM USB pass-through, has narrowly
missed two releases (including the current one) because we wanted to be
careful to get the interface right.

>>> Again, I didn't walk into v2 design. So here I don't want to bring any
>>> confusion to you just with my reply.
>>
>> This is your feature, so it is your responsibility to understand and
>> explain why you are doing what you are doing, if only to say "Jan wanted
> 
> Maybe you remember I just posted v1 but looks that was not a better
> design to show this implementation according to some feedback, so Kevin
> issued v2 revision and had a wider discussion with you guys. Since then
> I just follow this version. So I mean I don't further hold these things
> in high level since I just think both policy is fine to me because IMO,
> these two approaches are optional.
> 
>> X to happen because of Y [see $ref]."
>>
> 
> So this is why I said you'd better ask this to Kevin or Jan since I
> can't decide what's next at this point.

Let me say that again: I don't care whether anyone "pulled rank" and
ordered you to do something a certain way.  YOU are the one submitting
this patch.  That means YOU responsible for understanding why they want
it that way, and YOU are responsible for justifying it to other people.
 If you don't understand it at all, it's YOUR responsibility to get them
to explain it, not mine to chase them down.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-02 10:28                       ` George Dunlap
@ 2015-07-02 11:32                         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-02 11:32 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On 2015/7/2 18:28, George Dunlap wrote:
> On 07/02/2015 11:01 AM, Chen, Tiejun wrote:
>>> 1. After spending yet another half hour doing research, I haven't found
>>> any discussion that concluded we should have the global policy override
>>> the local policy
>>
>> I also took some time to go back checking this point and indeed this is
>> not in that public design. And as I mentioned in another email which is
>> following this, I also had a talk to Kevin about this issue, and looks
>> this is just concluded from our internal discussion and he didn't post
>> this in v2 design again because as you know, that design is about
>> something in high level. And as I recall, these discussions can't cover
>> everything at that moment because they thought we'd better post a
>> preliminary patches to further discuss something since this is really a
>> complicated case. So afterwards I sent out two RFC revisions to help all
>> guys finalize a good solution. And I can confirm current policy is
>> always same from the first RFC, but we didn't see any opposite advice
>> until now.
>
> Probably because the reviewers all assumed that the design draft had
> been followed, and you didn't make it clear that you'd changed it.

Shouldn't the patch head description already clarify this point? And I 
also comment this point in the code. After all, we already had several 
rounds of technical reviews so its a little hard to believe it was not 
obvious to be missed.

>
>>> 2. The only discussion I *did* find has *you yourself* saying that the
>>> per-device setting should override the global setting, not once, but
>>> twice; and nobody contradicting you.
>>>
>>> Maybe there is somewhere else a discussion somewhere where this was
>>> changed; but I've already spent half an hour this morning looking at
>>> where you said it was (v2 design discussion), and found the opposite --
>>> just as I remembered.  I'm not going to look anymore.
>>>
>>> You have now caused me to waste an awful lot of time on this series that
>>> could profitably have been used elsewhere.
>>
>> Sorry to this but I just think we already have 2 RFC revisions and 4
>> revisions without RFC, and some patches are already Acked, we really
>> should overturn this policy right now?
>
> First of all, I think it's easy to change.
>

I agree but what I'm saying is this is involving our policy. It 
shouldn't change this sort of thing if not all associated maintainers 
are in the agreement with you.

> Even if it weren't, I already said that I'd be OK with accepting the
> patch series with the existing "override" semantics, and without the
> "default" semantics, *if* it were renamed to make it clear what was
> going on.
>
> But, for future reference, I am not going to approve an interface I
> think is misleading or wrong -- particularly one like the xl interface
> which we want to avoid changing if possible -- just because time is
> short.  One of my own features, HVM USB pass-through, has narrowly
> missed two releases (including the current one) because we wanted to be
> careful to get the interface right.

I admit I should concern everything carefully like you.

>
>>>> Again, I didn't walk into v2 design. So here I don't want to bring any
>>>> confusion to you just with my reply.
>>>
>>> This is your feature, so it is your responsibility to understand and
>>> explain why you are doing what you are doing, if only to say "Jan wanted
>>
>> Maybe you remember I just posted v1 but looks that was not a better
>> design to show this implementation according to some feedback, so Kevin
>> issued v2 revision and had a wider discussion with you guys. Since then
>> I just follow this version. So I mean I don't further hold these things
>> in high level since I just think both policy is fine to me because IMO,
>> these two approaches are optional.
>>
>>> X to happen because of Y [see $ref]."
>>>
>>
>> So this is why I said you'd better ask this to Kevin or Jan since I
>> can't decide what's next at this point.
>
> Let me say that again: I don't care whether anyone "pulled rank" and
> ordered you to do something a certain way.  YOU are the one submitting
> this patch.  That means YOU responsible for understanding why they want
> it that way, and YOU are responsible for justifying it to other people.
>   If you don't understand it at all, it's YOUR responsibility to get them
> to explain it, not mine to chase them down.
>

As I said above I thought initially they're optional, and just about 
which one is a preference. So I picked up these patch descriptions 
reviewed in public to say this is our expectation. But looks this is not 
satisfied to you, so I don't think I can further explain this kind of 
thing appropriately, and then I ask you to ping Jan or Kevin to get a 
formal answer. Is this procedure not reasonable?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-01 10:47             ` Chen, Tiejun
  2015-07-01 14:39               ` George Dunlap
@ 2015-07-06 10:34               ` Jan Beulich
  2015-07-06 10:56                 ` George Dunlap
  2015-07-06 10:56                 ` Chen, Tiejun
  1 sibling, 2 replies; 114+ messages in thread
From: Jan Beulich @ 2015-07-06 10:34 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, George Dunlap,
	Andrew Cooper, Tim Deegan, xen-devel, Stefano Stabellini,
	Suravee Suthikulpanit, Yang Zhang, Aravind Gopalakrishnan

>>> On 01.07.15 at 12:47, <tiejun.chen@intel.com> wrote:
> On 2015/7/1 18:02, George Dunlap wrote:
>> On 07/01/2015 02:11 AM, Chen, Tiejun wrote:
>>> /* XEN_DOMCTL_createdomain */
>>> struct xen_domctl_createdomain {
>>>      /* IN parameters */
>>>      uint32_t ssidref;
>>>      xen_domain_handle_t handle;
>>>   /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
>>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>>   /* Use hardware-assisted paging if available? */
>>> #define _XEN_DOMCTL_CDF_hap           1
>>> #define XEN_DOMCTL_CDF_hap            (1U<<_XEN_DOMCTL_CDF_hap)
>>>   /* Should domain memory integrity be verifed by tboot during Sx? */
>>> #define _XEN_DOMCTL_CDF_s3_integrity  2
>>> #define XEN_DOMCTL_CDF_s3_integrity   (1U<<_XEN_DOMCTL_CDF_s3_integrity)
>>>   /* Disable out-of-sync shadow page tables? */
>>> #define _XEN_DOMCTL_CDF_oos_off       3
>>> #define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
>>>   /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
>>> #define _XEN_DOMCTL_CDF_pvh_guest     4
>>> #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
>>>      uint32_t flags;
>>
>> Yes, this demonstrates my point.  Each of these is a single-bit boolean
>> value that takes up a single bit -- either on or off.  But here you have
>> three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If
> 
> Is this fine to you?
> 
> #define _XEN_DOMCTL_DEV_NO_RDM          0
> #define XEN_DOMCTL_DEV_NO_RDM           (1U<<_XEN_DOMCTL_DEV_NO_RDM)
> #define _XEN_DOMCTL_DEV_RDM_RELAXED     1
> #define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
> #define _XEN_DOMCTL_DEV_RDM_STRICT      2
> #define XEN_DOMCTL_DEV_RDM_STRICT       (1U<<_XEN_DOMCTL_DEV_RDM_STRICT)

AIUI these aren't individual flags, but kind of an enumeration. I.e.
you should keep the original definitions and add - as suggested by
George - a mask (two bits wide right now).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-06 10:34               ` Jan Beulich
@ 2015-07-06 10:56                 ` George Dunlap
  2015-07-06 10:56                 ` Chen, Tiejun
  1 sibling, 0 replies; 114+ messages in thread
From: George Dunlap @ 2015-07-06 10:56 UTC (permalink / raw)
  To: Jan Beulich, Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Stefano Stabellini, Suravee Suthikulpanit, Yang Zhang,
	Aravind Gopalakrishnan

On 07/06/2015 11:34 AM, Jan Beulich wrote:
>>>> On 01.07.15 at 12:47, <tiejun.chen@intel.com> wrote:
>> On 2015/7/1 18:02, George Dunlap wrote:
>>> On 07/01/2015 02:11 AM, Chen, Tiejun wrote:
>>>> /* XEN_DOMCTL_createdomain */
>>>> struct xen_domctl_createdomain {
>>>>      /* IN parameters */
>>>>      uint32_t ssidref;
>>>>      xen_domain_handle_t handle;
>>>>   /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
>>>> #define _XEN_DOMCTL_CDF_hvm_guest     0
>>>> #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>>>>   /* Use hardware-assisted paging if available? */
>>>> #define _XEN_DOMCTL_CDF_hap           1
>>>> #define XEN_DOMCTL_CDF_hap            (1U<<_XEN_DOMCTL_CDF_hap)
>>>>   /* Should domain memory integrity be verifed by tboot during Sx? */
>>>> #define _XEN_DOMCTL_CDF_s3_integrity  2
>>>> #define XEN_DOMCTL_CDF_s3_integrity   (1U<<_XEN_DOMCTL_CDF_s3_integrity)
>>>>   /* Disable out-of-sync shadow page tables? */
>>>> #define _XEN_DOMCTL_CDF_oos_off       3
>>>> #define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
>>>>   /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
>>>> #define _XEN_DOMCTL_CDF_pvh_guest     4
>>>> #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
>>>>      uint32_t flags;
>>>
>>> Yes, this demonstrates my point.  Each of these is a single-bit boolean
>>> value that takes up a single bit -- either on or off.  But here you have
>>> three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If
>>
>> Is this fine to you?
>>
>> #define _XEN_DOMCTL_DEV_NO_RDM          0
>> #define XEN_DOMCTL_DEV_NO_RDM           (1U<<_XEN_DOMCTL_DEV_NO_RDM)
>> #define _XEN_DOMCTL_DEV_RDM_RELAXED     1
>> #define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>> #define _XEN_DOMCTL_DEV_RDM_STRICT      2
>> #define XEN_DOMCTL_DEV_RDM_STRICT       (1U<<_XEN_DOMCTL_DEV_RDM_STRICT)
> 
> AIUI these aren't individual flags, but kind of an enumeration. I.e.
> you should keep the original definitions and add - as suggested by
> George - a mask (two bits wide right now).

Actually, further on in the discussion it turns out that NO_RDM was
based on a misunderstanding of what this patch series was doing.  So
there are really only two options needed.

I have suggested just using a single-bit flag,
XEN_DOMCTL_DEV_RDM_RELAXED.  If it's not set, it's strict.  Julien was
OK with that approach as well.

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-06 10:34               ` Jan Beulich
  2015-07-06 10:56                 ` George Dunlap
@ 2015-07-06 10:56                 ` Chen, Tiejun
  2015-07-06 11:39                   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 10:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, George Dunlap,
	Andrew Cooper, Tim Deegan, xen-devel, Stefano Stabellini,
	Suravee Suthikulpanit, Yang Zhang, Aravind Gopalakrishnan

>>> Yes, this demonstrates my point.  Each of these is a single-bit boolean
>>> value that takes up a single bit -- either on or off.  But here you have
>>> three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If
>>
>> Is this fine to you?
>>
>> #define _XEN_DOMCTL_DEV_NO_RDM          0
>> #define XEN_DOMCTL_DEV_NO_RDM           (1U<<_XEN_DOMCTL_DEV_NO_RDM)
>> #define _XEN_DOMCTL_DEV_RDM_RELAXED     1
>> #define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>> #define _XEN_DOMCTL_DEV_RDM_STRICT      2
>> #define XEN_DOMCTL_DEV_RDM_STRICT       (1U<<_XEN_DOMCTL_DEV_RDM_STRICT)
>
> AIUI these aren't individual flags, but kind of an enumeration. I.e.
> you should keep the original definitions and add - as suggested by
> George - a mask (two bits wide right now).
>

Okay but George also thought NO_RDM may be pointless since we can just 
ignore this flag field simply for DT device, and he also thought one bit 
may be fine enough to cover two cases, strict and relaxed. So maybe 
finally, here is,

#define XEN_DOMCTL_DEV_RDM_RELAXED	1
#define XEN_DOMCTL_DEV_RDM_FLAGS_MASK	(0x1)

Thanks
Tiejun
	

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-06 10:56                 ` Chen, Tiejun
@ 2015-07-06 11:39                   ` Jan Beulich
  0 siblings, 0 replies; 114+ messages in thread
From: Jan Beulich @ 2015-07-06 11:39 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, George Dunlap,
	Andrew Cooper, Tim Deegan, xen-devel, Stefano Stabellini,
	Suravee Suthikulpanit, Yang Zhang, Aravind Gopalakrishnan

>>> On 06.07.15 at 12:56, <tiejun.chen@intel.com> wrote:
>>>> Yes, this demonstrates my point.  Each of these is a single-bit boolean
>>>> value that takes up a single bit -- either on or off.  But here you have
>>>> three values -- NO_DRM, RELAXED, and STRICT, that take up two bits.  If
>>>
>>> Is this fine to you?
>>>
>>> #define _XEN_DOMCTL_DEV_NO_RDM          0
>>> #define XEN_DOMCTL_DEV_NO_RDM           (1U<<_XEN_DOMCTL_DEV_NO_RDM)
>>> #define _XEN_DOMCTL_DEV_RDM_RELAXED     1
>>> #define XEN_DOMCTL_DEV_RDM_RELAXED      (1U<<_XEN_DOMCTL_DEV_RDM_RELAXED)
>>> #define _XEN_DOMCTL_DEV_RDM_STRICT      2
>>> #define XEN_DOMCTL_DEV_RDM_STRICT       (1U<<_XEN_DOMCTL_DEV_RDM_STRICT)
>>
>> AIUI these aren't individual flags, but kind of an enumeration. I.e.
>> you should keep the original definitions and add - as suggested by
>> George - a mask (two bits wide right now).
>>
> 
> Okay but George also thought NO_RDM may be pointless since we can just 
> ignore this flag field simply for DT device, and he also thought one bit 
> may be fine enough to cover two cases, strict and relaxed. So maybe 
> finally, here is,
> 
> #define XEN_DOMCTL_DEV_RDM_RELAXED	1
> #define XEN_DOMCTL_DEV_RDM_FLAGS_MASK	(0x1)

Except that then you don't need a mask.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-01 13:29               ` George Dunlap
  2015-07-02  1:11                 ` Chen, Tiejun
@ 2015-07-06 13:34                 ` Chen, Tiejun
  2015-07-06 13:51                   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 13:34 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

I think we should go back here.

> I was involved in the design discussion, and from the very beginning I
> probably saw your plan but misunderstood it.  I wouldn't be surprised if
> some others didn't quite understand what they were agreeing to.
>
> This way of doing things is different than the way we do it with most
> other options relating to pci devices (e.g., pci_permissive,
> pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
> semantic: the domain-wide setting takes effect only if it's not set
> locally.  If the syntax looks the same but the semantics is different,
> many people will be confused.  If we're going to have the domain-wide
> policy override the per-device policy, then the naming should make that
> clear; for instance, "override=(strict|relaxed|none)", or
> "strict_override=(1|0)".

Jan,

What about this?

This is involving our policy so please take a look at this as well.

George,

Actually we don't mean the domain-wide policy always override the 
per-device policy, or the per-device policy always override the 
per-device policy. Here we just take "strict" as the highest priority 
when it conflicts in two cases. As I said previously myself may not 
answer this very correctly but now I can recall or understand that one 
reason is that different devices can share one RMRR entry, so its 
possible that these two or more per-device policies are not same. So we 
need this particular rule which is not same as before. So I still prefer 
to keep our original implementation.

If I'm missing something or wrong, please Jan correct me.

Thanks
Tiejun

>
> I don't happen to think these "override" semantics are actually going to
> turn out to be that useful; I do think a "default" semantic would be
> useful.  But I'd be content if the name of the current setting were
> switched to "override" to make the semantics more clear.  We can always
> add in "default" at some later point if we really want.
>
>   -George
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 13:34                 ` Chen, Tiejun
@ 2015-07-06 13:51                   ` Jan Beulich
  2015-07-06 14:21                     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-07-06 13:51 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Ian Jackson, xen-devel

>>> On 06.07.15 at 15:34, <tiejun.chen@intel.com> wrote:
> I think we should go back here.
> 
>> I was involved in the design discussion, and from the very beginning I
>> probably saw your plan but misunderstood it.  I wouldn't be surprised if
>> some others didn't quite understand what they were agreeing to.
>>
>> This way of doing things is different than the way we do it with most
>> other options relating to pci devices (e.g., pci_permissive,
>> pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
>> semantic: the domain-wide setting takes effect only if it's not set
>> locally.  If the syntax looks the same but the semantics is different,
>> many people will be confused.  If we're going to have the domain-wide
>> policy override the per-device policy, then the naming should make that
>> clear; for instance, "override=(strict|relaxed|none)", or
>> "strict_override=(1|0)".
> 
> Jan,
> 
> What about this?
> 
> This is involving our policy so please take a look at this as well.

I don't think the way things get expressed in the domain config
directly relates to what the policy is. How to best express things
in the config I'd really like to leave to the tools maintainers.

> George,
> 
> Actually we don't mean the domain-wide policy always override the 
> per-device policy, or the per-device policy always override the 
> per-device policy. Here we just take "strict" as the highest priority 
> when it conflicts in two cases. As I said previously myself may not 
> answer this very correctly but now I can recall or understand that one 
> reason is that different devices can share one RMRR entry, so its 
> possible that these two or more per-device policies are not same. So we 
> need this particular rule which is not same as before. So I still prefer 
> to keep our original implementation.
> 
> If I'm missing something or wrong, please Jan correct me.

I don't think I fully understand what you try to describe above;
instead I think the global vs per-device settings should very much
behave just like others (i.e. fallback to global if there is no per-
device setting). Furthermore, didn't we settle on not allowing
pass-through of devices sharing RMRRs unless specifically told
to by the admin (in which case part of what you write above
would seem irrelevant to me)?

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 13:51                   ` Jan Beulich
@ 2015-07-06 14:21                     ` Chen, Tiejun
  2015-07-06 14:29                       ` George Dunlap
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 14:21 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell, Wei Liu, Ian Jackson, Stefano Stabellini
  Cc: George Dunlap, xen-devel

>>> This way of doing things is different than the way we do it with most
>>> other options relating to pci devices (e.g., pci_permissive,
>>> pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
>>> semantic: the domain-wide setting takes effect only if it's not set
>>> locally.  If the syntax looks the same but the semantics is different,
>>> many people will be confused.  If we're going to have the domain-wide
>>> policy override the per-device policy, then the naming should make that
>>> clear; for instance, "override=(strict|relaxed|none)", or
>>> "strict_override=(1|0)".
>>
>> Jan,
>>
>> What about this?
>>
>> This is involving our policy so please take a look at this as well.
>
> I don't think the way things get expressed in the domain config
> directly relates to what the policy is. How to best express things
> in the config I'd really like to leave to the tools maintainers.

Did you remember current definitions are from our previous discussion? 
 From froce/try to strict/relaxed ...  You're always getting involved so 
much so we'd better listen what you would say at this point.

>
>> George,
>>
>> Actually we don't mean the domain-wide policy always override the
>> per-device policy, or the per-device policy always override the
>> per-device policy. Here we just take "strict" as the highest priority
>> when it conflicts in two cases. As I said previously myself may not
>> answer this very correctly but now I can recall or understand that one
>> reason is that different devices can share one RMRR entry, so its
>> possible that these two or more per-device policies are not same. So we
>> need this particular rule which is not same as before. So I still prefer
>> to keep our original implementation.
>>
>> If I'm missing something or wrong, please Jan correct me.
>
> I don't think I fully understand what you try to describe above;
> instead I think the global vs per-device settings should very much
> behave just like others (i.e. fallback to global if there is no per-

If there's no any explicit per-device setting from .cfg, per-device 
always has its own default setting, right?

> device setting). Furthermore, didn't we settle on not allowing

Let's make this clear.

Our current implementation is something like what I described in the 
patch head description,

Default per-device RDM policy is 'strict', while default global RDM 
policy is 'relaxed'. When both policies are specified on a given region, 
'strict' is always preferred.

Any concern to this? Or still let per-device policy override per-domain 
policy like others?


> pass-through of devices sharing RMRRs unless specifically told
> to by the admin (in which case part of what you write above

Yes we can ignore this case in current phase.

> would seem irrelevant to me)?
>

I just think when we're arguing current policy, you may have a good 
suggestion.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 14:21                     ` Chen, Tiejun
@ 2015-07-06 14:29                       ` George Dunlap
  2015-07-06 14:34                         ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: George Dunlap @ 2015-07-06 14:29 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich, Ian Campbell, Wei Liu, Ian Jackson,
	Stefano Stabellini
  Cc: xen-devel

On 07/06/2015 03:21 PM, Chen, Tiejun wrote:
>>>> This way of doing things is different than the way we do it with most
>>>> other options relating to pci devices (e.g., pci_permissive,
>>>> pci_msitranslate, pci_sieze, &c).  All of those options use a "default"
>>>> semantic: the domain-wide setting takes effect only if it's not set
>>>> locally.  If the syntax looks the same but the semantics is different,
>>>> many people will be confused.  If we're going to have the domain-wide
>>>> policy override the per-device policy, then the naming should make that
>>>> clear; for instance, "override=(strict|relaxed|none)", or
>>>> "strict_override=(1|0)".
>>>
>>> Jan,
>>>
>>> What about this?
>>>
>>> This is involving our policy so please take a look at this as well.
>>
>> I don't think the way things get expressed in the domain config
>> directly relates to what the policy is. How to best express things
>> in the config I'd really like to leave to the tools maintainers.
> 
> Did you remember current definitions are from our previous discussion?
> From froce/try to strict/relaxed ...  You're always getting involved so
> much so we'd better listen what you would say at this point.
> 
>>
>>> George,
>>>
>>> Actually we don't mean the domain-wide policy always override the
>>> per-device policy, or the per-device policy always override the
>>> per-device policy. Here we just take "strict" as the highest priority
>>> when it conflicts in two cases. As I said previously myself may not
>>> answer this very correctly but now I can recall or understand that one
>>> reason is that different devices can share one RMRR entry, so its
>>> possible that these two or more per-device policies are not same. So we
>>> need this particular rule which is not same as before. So I still prefer
>>> to keep our original implementation.
>>>
>>> If I'm missing something or wrong, please Jan correct me.
>>
>> I don't think I fully understand what you try to describe above;
>> instead I think the global vs per-device settings should very much
>> behave just like others (i.e. fallback to global if there is no per-
> 
> If there's no any explicit per-device setting from .cfg, per-device
> always has its own default setting, right?
> 
>> device setting). Furthermore, didn't we settle on not allowing
> 
> Let's make this clear.
> 
> Our current implementation is something like what I described in the
> patch head description,
> 
> Default per-device RDM policy is 'strict', while default global RDM
> policy is 'relaxed'. When both policies are specified on a given region,
> 'strict' is always preferred.
> 
> Any concern to this? Or still let per-device policy override per-domain
> policy like others?

It sounds like part of the problem here is a matter of domains.

Jan cares mostly about what happens in the hypervisor.  At the
hypervisor level, there is only the per-device configurations, and he is
keen that rmrrs be "strict" by default, unless there is an explicit flag
to relax it.  (I agree with this, FWIW.)

What we've been arguing about is the xl layer -- what settings should
xl/libxl give to the hypervisor, based on what's in the domain config?

It sounds like Jan doesn't care a great deal about it, and in any case
would defer to the tools maintainers, but that if asked for his advice
he would say that the configuration in xl.cfg should act like all the
other pci device configurations: that you have a domain-wide default
that can be overridden in the per-device setting.

I.e.:
---
rdm='reserve=strict'
pci=[ '02:0.0', '01:1.1,rdm_reserve=relaxed' ]
---
Would pass "strict" for the first device, and "relaxed" for the second.

Do I understand you both properly, Jan / Tiejun?

 -George

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 14:29                       ` George Dunlap
@ 2015-07-06 14:34                         ` Jan Beulich
  2015-07-06 14:46                           ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-07-06 14:34 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, IanJackson, xen-devel,
	Tiejun Chen

>>> On 06.07.15 at 16:29, <george.dunlap@eu.citrix.com> wrote:
> It sounds like part of the problem here is a matter of domains.
> 
> Jan cares mostly about what happens in the hypervisor.  At the
> hypervisor level, there is only the per-device configurations, and he is
> keen that rmrrs be "strict" by default, unless there is an explicit flag
> to relax it.  (I agree with this, FWIW.)
> 
> What we've been arguing about is the xl layer -- what settings should
> xl/libxl give to the hypervisor, based on what's in the domain config?
> 
> It sounds like Jan doesn't care a great deal about it, and in any case
> would defer to the tools maintainers, but that if asked for his advice
> he would say that the configuration in xl.cfg should act like all the
> other pci device configurations: that you have a domain-wide default
> that can be overridden in the per-device setting.
> 
> I.e.:
> ---
> rdm='reserve=strict'
> pci=[ '02:0.0', '01:1.1,rdm_reserve=relaxed' ]
> ---
> Would pass "strict" for the first device, and "relaxed" for the second.
> 
> Do I understand you both properly, Jan / Tiejun?

Yes for me.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 14:34                         ` Jan Beulich
@ 2015-07-06 14:46                           ` Chen, Tiejun
  2015-07-06 17:16                             ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 14:46 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: IanJackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini

On 2015/7/6 22:34, Jan Beulich wrote:
>>>> On 06.07.15 at 16:29, <george.dunlap@eu.citrix.com> wrote:
>> It sounds like part of the problem here is a matter of domains.
>>
>> Jan cares mostly about what happens in the hypervisor.  At the
>> hypervisor level, there is only the per-device configurations, and he is
>> keen that rmrrs be "strict" by default, unless there is an explicit flag
>> to relax it.  (I agree with this, FWIW.)

I can't understand this point.

There's no any default flag/policy in the hypervisor level, and the 
hypervisor doesn't do anything to RMRR by itself.

All actions just take place when xl/xc issue our needed requirements 
according to the rdm setting in .cfg.

>>
>> What we've been arguing about is the xl layer -- what settings should
>> xl/libxl give to the hypervisor, based on what's in the domain config?
>>
>> It sounds like Jan doesn't care a great deal about it, and in any case
>> would defer to the tools maintainers, but that if asked for his advice
>> he would say that the configuration in xl.cfg should act like all the
>> other pci device configurations: that you have a domain-wide default
>> that can be overridden in the per-device setting.
>>
>> I.e.:
>> ---
>> rdm='reserve=strict'
>> pci=[ '02:0.0', '01:1.1,rdm_reserve=relaxed' ]
>> ---
>> Would pass "strict" for the first device, and "relaxed" for the second.
>>
>> Do I understand you both properly, Jan / Tiejun?
>
> Yes for me.
>

Looks all guys would like to walk into this way in the case of RMRR, so 
I can follow up this way. ( Maybe the confusion above doesn't matter now? )

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-02  8:49     ` Chen, Tiejun
@ 2015-07-06 14:52       ` Chen, Tiejun
  2015-07-07  6:37         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 14:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper, Tim Deegan,
	xen-devel, Aravind Gopalakrishnan, Suravee Suthikulpanit,
	Yang Zhang, Stefano Stabellini, Ian Campbell

Any comment to this?

Thanks
Tiejun

On 2015/7/2 16:49, Chen, Tiejun wrote:
>>> @@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn,
>>> struct pci_dev *pdev)
>>>                PCI_BUS(bdf) == pdev->bus &&
>>>                PCI_DEVFN2(bdf) == devfn )
>>>           {
>>> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>>> +            /*
>>> +             * RMRR is always reserved on e820 so either of flag
>>> +             * is fine for hardware domain and here we'd like to
>>> +             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
>>> +             */
>>> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>>> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);
>>
>> So two things.
>>
>> First, you assert that the value here won't matter, because the
>> hardware domain is guaranteed never to have a conflict.
>>
>> Which is likely to be true almost all the time; but the question is,
>> *if* something goes wrong, what should happen?
>>
>> For instance, suppose that someone accidentally introduces a bug in
>> Xen that messes up or ignores reading a portion of the e820 map under
>> certain circumstances.  What should happen?
>
> Yes, you can image all possible cases. But if this kind of bug can come
> true, I really very doubt if Xen can boot successfully. Because e820 is
> a fundamental key to run OS, so this case is very easy to panic Xen, right?
>
> Anyway, I agree we should concern all corner cases.
>
>>
>> If you set this to RELAXED, this clash will be silently ignored; which
>> means that devices that need RMRR will simply malfunction in weird
>> ways without any warning messages having been printed that might give
>
> No. We always post that messages regardless of relaxe or strict since
> this massage just depends on one condition of that conflict exist.
>
>> someone a hint about what is going on.
>>
>> If you set this to STRICT, then this clash will print an error
>> message, but as far as I can tell, the rest of the device assignment
>> will continue as normal.  (Please correct me if I've followed the code
>> wrong.)
>
> Not all cases are like this behavior but here is true.
>
>>
>> Since the device should be just as functional (or not functional)
>> either way, but in the STRICT case should actually print an error
>> message which someone might notice, it seems to me that STRICT is a
>> better option for the hardware domain.
>>
>
> Just see above.
>
>> Secondly, you assert in response to Kevin's question in v3 that this
>> path is only reachable when assigning to the hardware domain.  I think
>> you at least need to update the comment here to indicate that's what
>> you think; it's not at all obvious just from looking at the function
>
> What about this?
>
>                PCI_DEVFN2(bdf) == devfn )
>           {
>               /*
> -             * RMRR is always reserved on e820 so either of flag
> -             * is fine for hardware domain and here we'd like to
> -             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
> +             * Here means we're add a device to the hardware domain
> +             * so actually RMRR is always reserved on e820 so either
> +             * of flag is fine for hardware domain and here we'd like
> +             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
>                */
>               ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>                                           XEN_DOMCTL_DEV_RDM_RELAXED);
>
>
>> that this is true.  And if we do end up doing something besides
>> STRICT, we should check to make sure that pdev->domain really *is* the
>> hardware domain before acting like it is.
>>
>>>               if ( ret )
>>>                   dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping
>>> failed\n",
>>>                           pdev->domain->domain_id);
>>> @@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn,
>>> struct pci_dev *pdev)
>>>                PCI_DEVFN2(bdf) != devfn )
>>>               continue;
>>>
>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>>       }
>>
>> Same here wrt STRICT.
>
> This is inside intel_iommu_remove_device() so actually any flag doesn't
> take effect to rmrr_identity_mapping(). But I should add a comment like
> this,
>
> +        /*
> +         * Any flag is nothing to clear these mappings so here
> +         * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
> +         */
>
>
>>
>> After those changes (a single RDM_RELAXED flag, passing STRICT in for
>> the hardware domain) then I think this patch is in good shape.
>>
>
> Based on my understanding to your concern, seems you always think in
> case of "relax" we don't post any message, right? But now as I reply
> above this is not correct so what's your further consideration?
>
> Anyway, I'm fine to change this. And after you suggested to keep one bit
> just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual
> XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0
> easily.
>
> Thanks
> Tiejun
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-02  6:50                 ` Chen, Tiejun
@ 2015-07-06 14:55                   ` Chen, Tiejun
  2015-07-07  6:36                     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-06 14:55 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper,
	Julien Grall, Tim Deegan, xen-devel, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Yang Zhang, Stefano Stabellini,
	Ian Campbell

>> * for dt devices:
>>   - Ignore this flag entirely
>
> But we still a flag to assign_device() like this,
>
> diff --git a/xen/drivers/passthrough/device_tree.c
> b/xen/drivers/passthrough/device_tree.c
> index 5d3842a..a182487 100644
> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct
> dt_device_node *dev)
>               goto fail;
>       }
>
> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
> +                                         XEN_DOMCTL_DEV_RDM_RELAXED);
>
>       if ( rc )
>           goto fail;
>
> Or rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0)?
>

Any comments to this?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-07-06 14:46                           ` Chen, Tiejun
@ 2015-07-06 17:16                             ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-07-06 17:16 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	IanJackson, xen-devel, Jan Beulich

On Mon, Jul 06, 2015 at 10:46:06PM +0800, Chen, Tiejun wrote:
> On 2015/7/6 22:34, Jan Beulich wrote:
> >>>>On 06.07.15 at 16:29, <george.dunlap@eu.citrix.com> wrote:
> >>It sounds like part of the problem here is a matter of domains.
> >>
> >>Jan cares mostly about what happens in the hypervisor.  At the
> >>hypervisor level, there is only the per-device configurations, and he is
> >>keen that rmrrs be "strict" by default, unless there is an explicit flag
> >>to relax it.  (I agree with this, FWIW.)
> 
> I can't understand this point.
> 
> There's no any default flag/policy in the hypervisor level, and the
> hypervisor doesn't do anything to RMRR by itself.
> 
> All actions just take place when xl/xc issue our needed requirements
> according to the rdm setting in .cfg.
> 
> >>
> >>What we've been arguing about is the xl layer -- what settings should
> >>xl/libxl give to the hypervisor, based on what's in the domain config?
> >>
> >>It sounds like Jan doesn't care a great deal about it, and in any case
> >>would defer to the tools maintainers, but that if asked for his advice
> >>he would say that the configuration in xl.cfg should act like all the
> >>other pci device configurations: that you have a domain-wide default
> >>that can be overridden in the per-device setting.
> >>
> >>I.e.:
> >>---
> >>rdm='reserve=strict'
> >>pci=[ '02:0.0', '01:1.1,rdm_reserve=relaxed' ]
> >>---
> >>Would pass "strict" for the first device, and "relaxed" for the second.
> >>
> >>Do I understand you both properly, Jan / Tiejun?
> >
> >Yes for me.
> >
> 
> Looks all guys would like to walk into this way in the case of RMRR, so I
> can follow up this way. ( Maybe the confusion above doesn't matter now? )
> 

FWIW this works for me too. Do remember to update code comment / commit
message when you update your code.

Wei.

> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-06 14:55                   ` Chen, Tiejun
@ 2015-07-07  6:36                     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-07  6:36 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	Julien Grall, Tim Deegan, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Yang Zhang, Stefano Stabellini, Ian Campbell

Just please go to review the new revision directly.

Thanks
Tiejun

On 2015/7/6 22:55, Chen, Tiejun wrote:
>>> * for dt devices:
>>>   - Ignore this flag entirely
>>
>> But we still a flag to assign_device() like this,
>>
>> diff --git a/xen/drivers/passthrough/device_tree.c
>> b/xen/drivers/passthrough/device_tree.c
>> index 5d3842a..a182487 100644
>> --- a/xen/drivers/passthrough/device_tree.c
>> +++ b/xen/drivers/passthrough/device_tree.c
>> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct
>> dt_device_node *dev)
>>               goto fail;
>>       }
>>
>> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
>> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
>> +                                         XEN_DOMCTL_DEV_RDM_RELAXED);
>>
>>       if ( rc )
>>           goto fail;
>>
>> Or rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev), 0)?
>>
>
> Any comments to this?
>
> Thanks
> Tiejun
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-07-06 14:52       ` Chen, Tiejun
@ 2015-07-07  6:37         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-07-07  6:37 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan,
	xen-devel, Aravind Gopalakrishnan, Suravee Suthikulpanit,
	Yang Zhang, Stefano Stabellini

Just please go to review the new revision directly.

Thanks
Tiejun

On 2015/7/6 22:52, Chen, Tiejun wrote:
> Any comment to this?
>
> Thanks
> Tiejun
>
> On 2015/7/2 16:49, Chen, Tiejun wrote:
>>>> @@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn,
>>>> struct pci_dev *pdev)
>>>>                PCI_BUS(bdf) == pdev->bus &&
>>>>                PCI_DEVFN2(bdf) == devfn )
>>>>           {
>>>> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>>>> +            /*
>>>> +             * RMRR is always reserved on e820 so either of flag
>>>> +             * is fine for hardware domain and here we'd like to
>>>> +             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
>>>> +             */
>>>> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>>>> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);
>>>
>>> So two things.
>>>
>>> First, you assert that the value here won't matter, because the
>>> hardware domain is guaranteed never to have a conflict.
>>>
>>> Which is likely to be true almost all the time; but the question is,
>>> *if* something goes wrong, what should happen?
>>>
>>> For instance, suppose that someone accidentally introduces a bug in
>>> Xen that messes up or ignores reading a portion of the e820 map under
>>> certain circumstances.  What should happen?
>>
>> Yes, you can image all possible cases. But if this kind of bug can come
>> true, I really very doubt if Xen can boot successfully. Because e820 is
>> a fundamental key to run OS, so this case is very easy to panic Xen,
>> right?
>>
>> Anyway, I agree we should concern all corner cases.
>>
>>>
>>> If you set this to RELAXED, this clash will be silently ignored; which
>>> means that devices that need RMRR will simply malfunction in weird
>>> ways without any warning messages having been printed that might give
>>
>> No. We always post that messages regardless of relaxe or strict since
>> this massage just depends on one condition of that conflict exist.
>>
>>> someone a hint about what is going on.
>>>
>>> If you set this to STRICT, then this clash will print an error
>>> message, but as far as I can tell, the rest of the device assignment
>>> will continue as normal.  (Please correct me if I've followed the code
>>> wrong.)
>>
>> Not all cases are like this behavior but here is true.
>>
>>>
>>> Since the device should be just as functional (or not functional)
>>> either way, but in the STRICT case should actually print an error
>>> message which someone might notice, it seems to me that STRICT is a
>>> better option for the hardware domain.
>>>
>>
>> Just see above.
>>
>>> Secondly, you assert in response to Kevin's question in v3 that this
>>> path is only reachable when assigning to the hardware domain.  I think
>>> you at least need to update the comment here to indicate that's what
>>> you think; it's not at all obvious just from looking at the function
>>
>> What about this?
>>
>>                PCI_DEVFN2(bdf) == devfn )
>>           {
>>               /*
>> -             * RMRR is always reserved on e820 so either of flag
>> -             * is fine for hardware domain and here we'd like to
>> -             * pass XEN_DOMCTL_DEV_RDM_RELAXED.
>> +             * Here means we're add a device to the hardware domain
>> +             * so actually RMRR is always reserved on e820 so either
>> +             * of flag is fine for hardware domain and here we'd like
>> +             * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
>>                */
>>               ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>>                                           XEN_DOMCTL_DEV_RDM_RELAXED);
>>
>>
>>> that this is true.  And if we do end up doing something besides
>>> STRICT, we should check to make sure that pdev->domain really *is* the
>>> hardware domain before acting like it is.
>>>
>>>>               if ( ret )
>>>>                   dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping
>>>> failed\n",
>>>>                           pdev->domain->domain_id);
>>>> @@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn,
>>>> struct pci_dev *pdev)
>>>>                PCI_DEVFN2(bdf) != devfn )
>>>>               continue;
>>>>
>>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>>>       }
>>>
>>> Same here wrt STRICT.
>>
>> This is inside intel_iommu_remove_device() so actually any flag doesn't
>> take effect to rmrr_identity_mapping(). But I should add a comment like
>> this,
>>
>> +        /*
>> +         * Any flag is nothing to clear these mappings so here
>> +         * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
>> +         */
>>
>>
>>>
>>> After those changes (a single RDM_RELAXED flag, passing STRICT in for
>>> the hardware domain) then I think this patch is in good shape.
>>>
>>
>> Based on my understanding to your concern, seems you always think in
>> case of "relax" we don't post any message, right? But now as I reply
>> above this is not correct so what's your further consideration?
>>
>> Anyway, I'm fine to change this. And after you suggested to keep one bit
>> just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual
>> XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0
>> easily.
>>
>> Thanks
>> Tiejun
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
  2015-06-30  9:38               ` Chen, Tiejun
@ 2015-07-07 11:36                 ` Ian Campbell
  0 siblings, 0 replies; 114+ messages in thread
From: Ian Campbell @ 2015-07-07 11:36 UTC (permalink / raw)
  To: Chen, Tiejun; +Cc: Ian Jackson, xen-devel, Wei Liu, Stefano Stabellini

On Tue, 2015-06-30 at 17:38 +0800, Chen, Tiejun wrote:
> > [...]
> >> "none" is the default value and it means we don't check any reserved
> >> regions and then all rdm policies would be ignored.
> >
> >
> > I'm afraid I still don't understand what the difference between
> > "rdm=none" and simply not providing an rdm argument at all are.
> >
> 
> Currently they're the same case at this point.
> 
> As I said previously, this default option is used to communicate inside 
> xl but its still possible to introduce more options in the future, or 
> think about if one day we'd like to set "host" as a default option 
> internally, we still need this explicit option to help user ignore rdm, 
> right?

Yes, that makes sense, thanks.

>  So based on your question I just think at most we can remove this 
> option description in doc file right now, so any concern to this?

That would be ok. It would also be sufficient IMHO to add a line to the
docs indicating what the default is if this option is not given.

Ian.

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2015-07-07 11:36 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-23  9:57 [v4][PATCH 00/19] Fix RMRR Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-06-25  9:59   ` Tim Deegan
2015-07-01 15:43   ` George Dunlap
2015-06-23  9:57 ` [v4][PATCH 03/19] xen/vtd: create RMRR mapping Tiejun Chen
2015-06-23 10:12   ` Jan Beulich
2015-06-24  1:11     ` Chen, Tiejun
2015-06-24  6:48       ` Jan Beulich
2015-06-24  7:26         ` Chen, Tiejun
2015-06-24  7:33           ` Jan Beulich
2015-06-30 10:40             ` George Dunlap
2015-06-30 11:19               ` Chen, Tiejun
2015-06-23  9:57 ` [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-06-30 11:08   ` George Dunlap
2015-06-30 11:24     ` Chen, Tiejun
2015-06-30 14:20       ` George Dunlap
2015-07-01  1:11         ` Chen, Tiejun
2015-07-01 10:02           ` George Dunlap
2015-07-01 10:47             ` Chen, Tiejun
2015-07-01 14:39               ` George Dunlap
2015-07-01 15:06                 ` Julien Grall
2015-07-02  6:50                 ` Chen, Tiejun
2015-07-06 14:55                   ` Chen, Tiejun
2015-07-07  6:36                     ` Chen, Tiejun
2015-07-06 10:34               ` Jan Beulich
2015-07-06 10:56                 ` George Dunlap
2015-07-06 10:56                 ` Chen, Tiejun
2015-07-06 11:39                   ` Jan Beulich
2015-07-01 16:30   ` George Dunlap
2015-07-02  8:49     ` Chen, Tiejun
2015-07-06 14:52       ` Chen, Tiejun
2015-07-07  6:37         ` Chen, Tiejun
2015-06-23  9:57 ` [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-07-01 16:32   ` George Dunlap
2015-06-23  9:57 ` [v4][PATCH 06/19] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 07/19] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 08/19] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-06-25 10:44   ` Wei Liu
2015-06-23  9:57 ` [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
2015-06-25 10:54   ` Wei Liu
2015-06-23  9:57 ` [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-06-25 11:38   ` Wei Liu
2015-06-25 12:13   ` Ian Campbell
2015-06-26  8:38     ` Chen, Tiejun
2015-06-26  8:57       ` Ian Campbell
2015-06-26  9:36         ` Chen, Tiejun
2015-06-26 12:06           ` Wei Liu
2015-06-29  1:01             ` Chen, Tiejun
2015-06-30  3:08           ` Chen, Tiejun
2015-06-30  8:30             ` Ian Campbell
2015-06-30  9:38               ` Chen, Tiejun
2015-07-07 11:36                 ` Ian Campbell
2015-06-25 12:31   ` Ian Jackson
2015-06-30  3:07     ` Chen, Tiejun
2015-06-30 15:54   ` George Dunlap
2015-07-01  1:16     ` Chen, Tiejun
2015-07-01 10:07       ` George Dunlap
2015-07-01 10:26         ` Chen, Tiejun
2015-07-01 10:57           ` George Dunlap
2015-07-01 11:16             ` Chen, Tiejun
2015-07-01 13:29               ` George Dunlap
2015-07-02  1:11                 ` Chen, Tiejun
2015-07-02  4:47                   ` Chen, Tiejun
2015-07-02  9:22                   ` George Dunlap
2015-07-02 10:01                     ` Chen, Tiejun
2015-07-02 10:28                       ` George Dunlap
2015-07-02 11:32                         ` Chen, Tiejun
2015-07-06 13:34                 ` Chen, Tiejun
2015-07-06 13:51                   ` Jan Beulich
2015-07-06 14:21                     ` Chen, Tiejun
2015-07-06 14:29                       ` George Dunlap
2015-07-06 14:34                         ` Jan Beulich
2015-07-06 14:46                           ` Chen, Tiejun
2015-07-06 17:16                             ` Wei Liu
2015-06-23  9:57 ` [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy Tiejun Chen
2015-06-25 11:37   ` Wei Liu
2015-06-25 12:15   ` Ian Campbell
2015-06-26  8:53     ` Chen, Tiejun
2015-06-26  9:01       ` Ian Campbell
2015-06-26  9:28         ` Chen, Tiejun
2015-06-25 12:33   ` Ian Jackson
2015-06-30  2:14     ` Chen, Tiejun
2015-06-30 15:56   ` George Dunlap
2015-07-01  1:23     ` Chen, Tiejun
2015-07-01 10:22       ` George Dunlap
2015-07-01 10:56         ` Chen, Tiejun
2015-06-30 16:11   ` George Dunlap
2015-07-01  1:30     ` Chen, Tiejun
2015-07-01 10:31       ` George Dunlap
2015-07-02  9:27         ` Chen, Tiejun
2015-06-23  9:57 ` [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build Tiejun Chen
2015-06-25 11:08   ` Wei Liu
2015-06-26  0:56     ` Chen, Tiejun
2015-06-26 12:07       ` Wei Liu
2015-06-23  9:57 ` [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-06-25 11:23   ` Wei Liu
2015-06-26  5:45     ` Chen, Tiejun
2015-06-26 12:13       ` Wei Liu
2015-06-29  6:36         ` Chen, Tiejun
2015-06-23  9:57 ` [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary Tiejun Chen
2015-06-25 11:27   ` Wei Liu
2015-06-26  6:54     ` Chen, Tiejun
2015-06-23  9:57 ` [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
2015-06-25 11:33   ` Wei Liu
2015-06-26  7:13     ` Chen, Tiejun
2015-06-26 12:14       ` Wei Liu
2015-06-23  9:57 ` [v4][PATCH 17/19] xen/vtd: enable USB device assignment Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 18/19] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
2015-06-23  9:57 ` [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters Tiejun Chen
2015-06-25 11:35   ` Wei Liu
2015-06-30 16:30   ` George Dunlap
2015-07-01  1:31     ` Chen, Tiejun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.