From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tiejun Chen Subject: [v7][PATCH 00/16] Fix RMRR Date: Thu, 9 Jul 2015 13:33:51 +0800 Message-ID: <1436420047-25356-1-git-send-email-tiejun.chen@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org v7: * Need to rename some parameters: In the xl rdm config parsing, `reserve=' should be `policy='. In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='. The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'. The field name `reserve' in `libxl_rdm_reserve' should be `policy'. * Just sync with the fallout of renaming parameters above. Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian Campbell. ( If I'm wrong just let me know at this point. ) And as we discussed I'd further improve something as next step after this round of review. v6: * Inside patch #01, add a comments to the nr_entries field inside xen_reserved_device_memory_map. Note this is from Jan. * Inside patch #10, we need rename something to make our policy reasonable "type" -> "strategy" "none" -> "ignore" and based on our discussion, we won't expose "ignore" in xl level and just keep that as a default, and then sync docs and the patch head description * Inside patch #10, we fix some code stypes and especially we refine libxl__xc_device_get_rdm() * Inside patch #16, we need to sync those renames introduced by patch #10. v5: * Fold our original patch #2 and #3 as this new, and here introduce a new, clear_identity_p2m_entry, which can wrapper guest_physmap_remove_page(). And we use this to clean our identity mapping. * Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so now "0" means "strict" and "1" means "relaxed", and also make DT device ignore the flag field simply. And then correct all associated code comments. * Just make sure the per-device plicy always override the global policy, and so cleanup some associated comments and the patch head description. * Improve some descriptions in doc. * Make all rdm variables specific to .hvm * Inside patch #6, we're trying to rename that field, is_64bar, inside struct bars with flag, and then extend to also indicate if this bar is already allocated. * Inside patch 11, Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace malloc() with libxl__malloc(), and finally cleanup this fallout. libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL. Then instead, the allocated RDM entries would be returned with an out parameter. * The original patch #13 is sent out separately since actually this is not related to RMRR. v4: * Change one condition inside patch #2, "xen/x86/p2m: introduce set_identity_p2m_entry", if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) to make sure we just catch our requirement. * Inside patch #3, "xen/vtd: create RMRR mapping", Instead of intel_iommu_unmap_page(), we should use guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop iommu_map_page() since actually ept_set_entry() can do this internally. * Inside patch #4, "xen/passthrough: extend hypercall to support rdm reservation policy", add code comments to describer why we fix to set a policy flag in some cases like adding a device to hwdomain, and removing a device from user domain. And fix one judging condition domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM Additionally, also add to range check the flag passed to make future extensions possible (and to avoid ambiguity on what out of range values would mean). * Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we move some codes related to e820 to that specific file, e820.c, and consolidate "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for the parameter of get_mem_mapping_layout() * Inside patch #7, "hvmloader/pci: skip reserved ranges" We have to re-design this as follows: #1. Goal MMIO region should exclude all reserved device memory #2. Requirements #2.1 Still need to make sure MMIO region is fit all pci devices as before #2.2 Accommodate the not aligned reserved memory regions If I'm missing something let me know. #3. How to #3.1 Address #2.1 We need to either of populating more RAM, or of expanding more highmem. But we should know just 64bit-bar can work with highmem, and as you mentioned we also should avoid expanding highmem as possible. So my implementation is to allocate 32bit-bar and 64bit-bar orderly. 1>. The first allocation round just to 32bit-bar If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar with all remaining resources including low pci memory. If not, we need to calculate how much RAM should be populated to allocate the remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go to the second allocation round 2>. 2>. The second allocation round to the remaining 32bit-bar We should can finish allocating all 32bit-bar in theory, then go to the third allocation round 3>. 3>. The third allocation round to 64bit-bar We'll try to first allocate from the remaining low memory resource. If that isn't enough, we try to expand highmem to allocate for 64bit-bar. This process should be same as the original. #3.2 Address #2.2 I'm trying to accommodate the not aligned reserved memory regions: We should skip all reserved device memory, but we also need to check if other smaller bars can be allocated if a mmio hole exists between resource->base and reserved device memory. If a hole exists between base and reserved device memory, lets go out simply to try allocate for next bar since all bars are in descending order of size. If not, we need to move resource->base to reserved_end just to reallocate this bar * Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to adjust highmme if lowmem is changed such as hvmloader has to populate more RAM to allocate bars. * Inside of patch #11, "tools: introduce some new parameters to set rdm policy", we don't define init_val for for libxl_rdm_reserve_type since its just zero, and grab those changes to xl/libxlu to as a final patch. * Inside of patch #12, "passes rdm reservation policy", fix one typo, s/unkwon/unknown. And in command description, we should use "[]" to indicate it's optional for that extended xl command, pci-attach. * Patch #13 is separated from current patch #14 since this is specific to xc. * Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and just unconditionally set *nr_entries to 0. And additionally, we grab to all stuffs to provide a parameter to set our predefined boundary dynamically to as a separated patch later * Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use goto style error handling, and instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820. Overall, we refined several the patch head descriptions and code comments. v3: * Rearrange all patches orderly as Wei suggested * Rebase on the latest tree * Address some Wei's comments on tools side * Two changes for runtime cycle patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side a>. Introduce paging_mode_translate() Otherwise, we'll see this error when boot Xen/Dom0 (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702 (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- .... (XEN) Xen call trace: (XEN) [] p2m_pt_get_entry+0x29/0x558 (XEN) [] set_identity_p2m_entry+0xfc/0x1f0 (XEN) [] rmrr_identity_mapping+0x154/0x1ce (XEN) [] intel_iommu_hwdom_init+0x76/0x158 (XEN) [] iommu_hwdom_init+0x179/0x188 (XEN) [] construct_dom0+0x2fed/0x35d8 (XEN) [] __start_xen+0x22d8/0x2381 (XEN) [] __high_start+0x53/0x55 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702 Note I don't copy all info since I think the above is enough. b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm we're getting an invalid mfn. * Add patch #16 to handle those devices which share same RMRR. v2: * Instead of that fixed predefined rdm memory boundary, we'd like to introduce a parameter, "rdm_mem_boundary", to set this threshold value. * Remove that existing USB hack. * Make sure the MMIO regions all fit in the available resource window * Rename our policy, "force/try" -> "strict/relaxed" * Indeed, Wei and Jan gave me more and more comments to refine codes * Code style * Better and reasonable code implementation * Correct or improve code comments. * A little bit to work well with ARM. Open: * We should fail assigning device which has a shared RMRR with another device. We can only do group assignment when RMRR is shared among devices. We need more time to figure a good policy/way out because something is not clear to me. As you know all devices are owned by Dom0 firstly before we create any DomU, right? Do we allow Dom0 still own a group device while assign another device in the same group? Really appreciate any comments to policy. v1: RMRR is an acronym for Reserved Memory Region Reporting, expected to be used for legacy usages (such as USB, UMA Graphics, etc.) requiring reserved memory. Special treatment is required in system software to setup those reserved regions in IOMMU translation structures, otherwise passing through a device with RMRR reported may not work correctly. This patch set tries to enhance existing Xen RMRR implementation to fix various reported and theoretical problems. Most noteworthy changes are to setup identity mapping in p2m layer and handle possible conflicts between reported regions and gfn space. Initial proposal can be found at: http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html and after a long discussion a summarized agreement is here: http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html Below is a key summary of this patch set according to agreed proposal: 1. Use RDM (Reserved Device Memory) name in user space as a general description instead of using ACPI RMRR name directly. 2. Introduce configuration parameters to allow user control both per-device and global RDM resources along with desired policies upon a detected conflict. 3. Introduce a new hypercall to query global and per-device RDM resources. 4. Extend libxl to be a central place to manage RDM resources and handle potential conflicts between reserved regions and gfn space. One simplification goal is made to keep existing lowmem / mmio / highmem layout which is passed around various function blocks. So a reasonable assumption is made, that conflicts falling into below areas are not re-arranged otherwise it will result in a more scattered layout: a) in highmem region (>4G) b) in lowmem region, and below a predefined boundary (default 2G) a) is a new assumption not discussed before. From VT-d spec this is possible but no such observation in real-world. So we can make this reasonable assumption until there's real usage on it. 5. Extend XENMEM_set_memory_map usable for HVM guest, and then have libxl to use that hypercall to carry RDM information to hvmloader. There is one difference from original discussion. Previously we discussed to introduce a new E820 type specifically for RDM entries. After more thought we think it's OK to just tag them as E820_reserved. Actually hvmloader doesn't need to know whether the reserved entries come from RDM or from other purposes. 6. Then in hvmloader the change is generic for XENMEM_memory_map change. Given a predefined memory layout, hvmloader should avoid allocating all reserved entries for other usages (opregion, mmio, etc.) 7. Extend existing device passthrough hypercall to carry conflict handling policy. 8. Setup identity map in p2m layer for RMRRs reported for the given device. And conflicts are handled according to specified policy in hypercall. Current patch set contains core enhancements calling for comments. There are still several tasks not implemented now. We'll include them in final version after RFC is agreed: - remove existing USB hack - detect and fail assigning device which has a shared RMRR with another device - add a config parameter to configure that memory boundary flexibly - In the case of hotplug we also need to figure out a way to fix that policy conflict between the per-pci policy and the global policy but firstly we think we'd better collect some good or correct ideas to step next in RFC. So here I made this as RFC to collect your any comments. ---------------------------------------------------------------- Jan Beulich (1): xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen (15): xen/vtd: create RMRR mapping xen/passthrough: extend hypercall to support rdm reservation policy xen: enable XENMEM_memory_map in hvm hvmloader: get guest memory map into memory_map[] hvmloader/pci: skip reserved ranges hvmloader/e820: construct guest e820 table tools/libxc: Expose new hypercall xc_reserved_device_memory_map tools: extend xc_assign_device() to support rdm reservation policy tools: introduce some new parameters to set rdm policy tools/libxl: detect and avoid conflicts with RDM tools: introduce a new parameter to set a predefined rdm boundary libxl: construct e820 map with RDM information for HVM guest xen/vtd: enable USB device assignment xen/vtd: prevent from assign the device with shared rmrr tools: parse to enable new rdm policy parameters Jan Beulich (1): xen: introduce XENMEM_reserved_device_memory_map docs/man/xl.cfg.pod.5 | 103 ++++++++ docs/misc/vtd.txt | 24 ++ tools/firmware/hvmloader/e820.c | 115 +++++++-- tools/firmware/hvmloader/e820.h | 7 + tools/firmware/hvmloader/hvmloader.c | 2 + tools/firmware/hvmloader/pci.c | 194 +++++++++++--- tools/firmware/hvmloader/util.c | 26 ++ tools/firmware/hvmloader/util.h | 12 + tools/libxc/include/xenctrl.h | 11 +- tools/libxc/xc_domain.c | 45 +++- tools/libxl/libxl.h | 6 + tools/libxl/libxl_create.c | 13 +- tools/libxl/libxl_dm.c | 264 ++++++++++++++++++++ tools/libxl/libxl_dom.c | 16 +- tools/libxl/libxl_internal.h | 37 ++- tools/libxl/libxl_pci.c | 12 +- tools/libxl/libxl_types.idl | 26 ++ tools/libxl/libxl_x86.c | 83 ++++++ tools/libxl/libxlu_pci.c | 90 +++++++ tools/libxl/libxlutil.h | 4 + tools/libxl/xl_cmdimpl.c | 16 ++ tools/ocaml/libs/xc/xenctrl_stubs.c | 16 +- tools/python/xen/lowlevel/xc/xc.c | 30 ++- xen/arch/x86/hvm/hvm.c | 2 - xen/arch/x86/mm.c | 6 - xen/arch/x86/mm/p2m.c | 43 +++- xen/common/compat/memory.c | 66 +++++ xen/common/memory.c | 64 +++++ xen/drivers/passthrough/amd/pci_amd_iommu.c | 3 +- xen/drivers/passthrough/arm/smmu.c | 2 +- xen/drivers/passthrough/device_tree.c | 3 +- xen/drivers/passthrough/iommu.c | 10 + xen/drivers/passthrough/pci.c | 15 +- xen/drivers/passthrough/vtd/dmar.c | 32 +++ xen/drivers/passthrough/vtd/dmar.h | 1 - xen/drivers/passthrough/vtd/extern.h | 1 + xen/drivers/passthrough/vtd/iommu.c | 87 +++++-- xen/drivers/passthrough/vtd/utils.c | 7 - xen/include/asm-x86/p2m.h | 13 +- xen/include/public/domctl.h | 3 + xen/include/public/memory.h | 37 ++- xen/include/xen/iommu.h | 12 +- xen/include/xen/pci.h | 2 + xen/include/xlat.lst | 3 +- 44 files changed, 1447 insertions(+), 117 deletions(-) Thanks Tiejun