From mboxrd@z Thu Jan 1 00:00:00 1970 From: Haozhong Zhang Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen Date: Wed, 17 Feb 2016 17:01:05 +0800 Message-ID: <20160217090105.GD5459@hz-desktop.sh.intel.com> References: <20160201054414.GA25211@hz-desktop.sh.intel.com> <20160202191519.GB21656@char.us.oracle.com> <20160215084352.GB8938@hz-desktop.sh.intel.com> <56C1BF9302000078000D202D@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <56C1BF9302000078000D202D@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Juergen Gross , Kevin Tian , Wei Liu , Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Ian Jackson , "xen-devel@lists.xen.org" , Jun Nakajima , Xiao Guangrong , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 02/15/16 04:07, Jan Beulich wrote: > >>> On 15.02.16 at 09:43, wrote: > > On 02/03/16 03:15, Konrad Rzeszutek Wilk wrote: > >> > Similarly to that in KVM/QEMU, enabling vNVDIMM in Xen is composed of > >> > three parts: > >> > (1) Guest clwb/clflushopt/pcommit enabling, > >> > (2) Memory mapping, and > >> > (3) Guest ACPI emulation. > >> > >> > >> .. MCE? and vMCE? > >> > > > > NVDIMM can generate UCR errors like normal ram. Xen may handle them in a > > way similar to what mc_memerr_dhandler() does, with some differences in > > the data structure and the broken page offline parts: > > > > Broken NVDIMM pages should be marked as "offlined" so that Xen > > hypervisor can refuse further requests that map them to DomU. > > > > The real problem here is what data structure will be used to record > > information of NVDIMM pages. Because the size of NVDIMM is usually much > > larger than normal ram, using struct page_info for NVDIMM pages would > > occupy too much memory. > > I don't see how your alternative below would be less memory > hungry: Since guests have at least partial control of their GFN > space, a malicious guest could punch holes into the contiguous > GFN range that you appear to be thinking about, thus causing > arbitrary splitting of the control structure. > QEMU would always use MFN above guest normal ram and I/O holes for vNVDIMM. It would attempt to search in that space for a contiguous range that is large enough for that that vNVDIMM devices. Is guest able to punch holes in such GFN space? > Also - see how you all of the sudden came to think of using > struct page_info here (implying hypervisor control of these > NVDIMM ranges)? > > > (4) When a MCE for host NVDIMM SPA range [start_mfn, end_mfn] happens, > > (a) search xen_nvdimm_pages_list for affected nvdimm_pages structures, > > (b) for each affected nvdimm_pages, if it belongs to a domain d and > > its broken field is already set, the domain d will be shutdown to > > prevent malicious guest accessing broken page (similarly to what > > offline_page() does). > > (c) for each affected nvdimm_pages, set its broken field to 1, and > > (d) for each affected nvdimm_pages, inject to domain d a vMCE that > > covers its GFN range if that nvdimm_pages belongs to domain d. > > I don't see why you'd want to mark the entire range bad: All > that's known to be broken is a single page. Hence this would be > another source of splits of the proposed control structures. > Oh yes, I should split the whole range here. Such kind of splits is caused by hardware errors. Unless the host NVDIMM is terribly broken, there should not be a large amount of splits. Haozhong