From mboxrd@z Thu Jan 1 00:00:00 1970 From: Haozhong Zhang Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen Date: Mon, 15 Feb 2016 16:43:52 +0800 Message-ID: <20160215084352.GB8938@hz-desktop.sh.intel.com> References: <20160201054414.GA25211@hz-desktop.sh.intel.com> <20160202191519.GB21656@char.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20160202191519.GB21656@char.us.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Juergen Gross , "Tian, Kevin" , Wei Liu , Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich , "Nakajima, Jun" , Xiao Guangrong , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 02/03/16 03:15, Konrad Rzeszutek Wilk wrote: > > 3. Design of vNVDIMM in Xen > > Thank you for this design! > > > > > Similarly to that in KVM/QEMU, enabling vNVDIMM in Xen is composed of > > three parts: > > (1) Guest clwb/clflushopt/pcommit enabling, > > (2) Memory mapping, and > > (3) Guest ACPI emulation. > > > .. MCE? and vMCE? > NVDIMM can generate UCR errors like normal ram. Xen may handle them in a way similar to what mc_memerr_dhandler() does, with some differences in the data structure and the broken page offline parts: Broken NVDIMM pages should be marked as "offlined" so that Xen hypervisor can refuse further requests that map them to DomU. The real problem here is what data structure will be used to record information of NVDIMM pages. Because the size of NVDIMM is usually much larger than normal ram, using struct page_info for NVDIMM pages would occupy too much memory. Alternatively, we may use a range set to represent NVDIMM pages: struct nvdimm_pages { unsigned long mfn; /* starting MFN of a range of NVDIMM pages */ unsigned long gfn; /* starting GFN where this range is mapped, initially INVALID_GFN */ unsigned long len; /* length of this range in bytes */ int broken; /* 0: initial value, 1: this range of NVDIMM pages are broken and offlined */ struct domain *d; /* NULL: initial value, Not NULL: which domain this range is mapped to */ /* * Every nvdimm_pages structure is linked in the global * xen_nvdimm_pages_list. * * If it is mapped to a domain d, it will be also linked in * d->arch.nvdimm_pages_list. */ struct list_head *domain_list; struct list_head *global_list; } struct list_head xen_nvdimm_pages_list; /* in asm-x86/domain.h */ struct arch_domain { ... struct list_head nvdimm_pages_list; } (1) Initially, Xen hypervisor creates a nvdimm_pages structure for each pmem region (starting SPA and size reported by Dom0 NVDIMM driver) and links all nvdimm_pages structures in xen_nvdimm_pages_list. (2) If Xen hypervisor is then requested to map a range of NVDIMM pages [start_mfn, end_mfn] to gfn of domain d, it will (a) Check whether the GFN range [gfn, gfn + end_mfn - start_mfn + 1] of domain d has been occupied (e.g. by normal ram, I/O or other vNVDIMM). (b) Search xen_nvdimm_pages_list for one or multiple nvdimm_pages that [start_mfn, end_mfn] can fit in. If a nvdimm_pages structure is entirely covered by [start_mfn, end_mfn], then link that nvdimm_pages structure to d->arch.nvdimm_pages_list. If only a portion of a nvdimm_pages structure is covered by [start_mfn, end_mfn], then split that nvdimm_pages structure into multiple ones (the one entirely covered and at most two not covered), link the covered one to d->arch.nvdimm_pages_list and all of them to xen_nvdimm_pages_list as well. gfn and d fields of nvdimm_pages structures linked to d->arch.nvdimm_pages_list are also set accordingly. (3) When a domain d is shutdown/destroyed, merge its nvdimm_pages structures (i.e. those in d->arch.nvdimm_pages_list) in xen_nvdimm_pages_list. (4) When a MCE for host NVDIMM SPA range [start_mfn, end_mfn] happens, (a) search xen_nvdimm_pages_list for affected nvdimm_pages structures, (b) for each affected nvdimm_pages, if it belongs to a domain d and its broken field is already set, the domain d will be shutdown to prevent malicious guest accessing broken page (similarly to what offline_page() does). (c) for each affected nvdimm_pages, set its broken field to 1, and (d) for each affected nvdimm_pages, inject to domain d a vMCE that covers its GFN range if that nvdimm_pages belongs to domain d. Comments, pls. Thanks, Haozhong