[v10][PATCH 00/16] Fix RMRR

* [v10][PATCH 00/16] Fix RMRR
@ 2015-07-20  6:16 Tiejun Chen
  2015-07-20  6:16 ` [v10][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (16 more replies)
  0 siblings, 17 replies; 74+ messages in thread
From: Tiejun Chen @ 2015-07-20  6:16 UTC (permalink / raw)
  To: xen-devel

v10:

* Patch #6: hvmloader/pci: Try to avoid placing BARs in RMRRs
  This is from George' draft patch which implements an acceptable
  solution in current cycle. Here I just implemented check_overlap_all() and
  some cleanups.

* Patch #7: hvmloader/e820: construct guest e820 table
  Instead of correcting e820, I'd like to correct memory_map.map[]
  and then copy them into e820 directly. I think this can make sure
  hvm_info, memory_map.map[] and e820 are on the same page.

v9:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Correct the patch head description:
  [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
    -> [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END);
  Merge two if{} as one if{};

* Patch #6: hvmloader/pci: disable all pci devices conflicting with rdm
  A little improvement to code implementation but again, its still argued
  about this solution. Myself prefer to take a look at v7 if possible.

* Patch #7: hvmloader/e820: construct guest e820 table
  Refine that chunk of codes to check/modify highmem

* Patch #15: xen/vtd: prevent from assign the device with shared rmrr
  Correct one indentation issue

v8:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Force to pass "0"(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

* Patch #6: hvmloader/pci: disable all pci devices conflicting
  We have a big change to this patch:

  Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

  I know this is still argumented but I'd like to discuss this based on this
  revision and thanks for your time.

* Patch #7: hvmloader/e820: construct guest e820 table
  define low_mem_end as uint32_t;
  Correct those two wrong loops, memory_map.nr_map -> nr
  when we're trying to revise low/high memory e820 entries;
  Improve code comments and the patch head description;
  Add one check if highmem is just populated by hvmloader itself

* Patch #11: tools/libxl: detect and avoid conflicts with RDM
  Introduce pfn_to_paddr(x) -> ((uint64_t)x << XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

* Patch #13: libxl: construct e820 map with RDM information for HVM guest
  make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

* Patch #15:  xen/vtd: prevent from assign the device with shared rmrr
  Merge two if{} as one if{};
  Add to print RMRR range info when stop assign a group device

* Some minimal code style changes

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

* Just sync with the fallout of renaming parameters above.

Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. ) And
as we discussed I'd further improve something as next step after
this round of review.

v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  "type" -> "strategy"
  "none" -> "ignore"
  and based on our discussion, we won't expose "ignore" in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now "0" means "strict" and "1" means "relaxed", and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename that field, is_64bar, inside struct
  bars with flag, and then extend to also indicate if this bar is already
  allocated.

* Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(),
  and then replace malloc() with libxl__malloc(), and finally cleanup this fallout.
  libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out parameter.

* The original patch #13 is sent out separately since actually this is not related
  to RMRR.

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.

v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: detect and avoid conflicts with RDM
      tools: introduce a new parameter to set a predefined rdm boundary
      libxl: construct e820 map with RDM information for HVM guest
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr
      tools: parse to enable new rdm policy parameters

Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5                       | 103 ++++++++
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             | 140 +++++++++-
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |   2 +
 tools/firmware/hvmloader/pci.c              |  62 +++++
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  12 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  45 +++-
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_arch.h                    |   7 +
 tools/libxl/libxl_arm.c                     |   8 +
 tools/libxl/libxl_create.c                  |  13 +-
 tools/libxl/libxl_dm.c                      | 273 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  13 +-
 tools/libxl/libxl_pci.c                     |  12 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  83 ++++++
 tools/libxl/libxlu_pci.c                    |  92 ++++++-
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  16 +-
 tools/python/xen/lowlevel/xc/xc.c           |  30 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  43 ++-
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |   3 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  15 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  82 ++++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |  13 +-
 xen/include/public/domctl.h                 |   3 +
 xen/include/public/memory.h                 |  37 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 46 files changed, 1367 insertions(+), 87 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 74+ messages in thread