From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Juergen Gross <jgross@suse.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Wei Liu <wei.liu2@citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <jbeulich@suse.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
Xiao Guangrong <guangrong.xiao@linux.intel.com>,
Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Mon, 15 Feb 2016 16:43:52 +0800 [thread overview]
Message-ID: <20160215084352.GB8938@hz-desktop.sh.intel.com> (raw)
In-Reply-To: <20160202191519.GB21656@char.us.oracle.com>
On 02/03/16 03:15, Konrad Rzeszutek Wilk wrote:
> > 3. Design of vNVDIMM in Xen
>
> Thank you for this design!
>
> >
> > Similarly to that in KVM/QEMU, enabling vNVDIMM in Xen is composed of
> > three parts:
> > (1) Guest clwb/clflushopt/pcommit enabling,
> > (2) Memory mapping, and
> > (3) Guest ACPI emulation.
>
>
> .. MCE? and vMCE?
>
NVDIMM can generate UCR errors like normal ram. Xen may handle them in a
way similar to what mc_memerr_dhandler() does, with some differences in
the data structure and the broken page offline parts:
Broken NVDIMM pages should be marked as "offlined" so that Xen
hypervisor can refuse further requests that map them to DomU.
The real problem here is what data structure will be used to record
information of NVDIMM pages. Because the size of NVDIMM is usually much
larger than normal ram, using struct page_info for NVDIMM pages would
occupy too much memory.
Alternatively, we may use a range set to represent NVDIMM pages:
struct nvdimm_pages
{
unsigned long mfn; /* starting MFN of a range of NVDIMM pages */
unsigned long gfn; /* starting GFN where this range is mapped,
initially INVALID_GFN */
unsigned long len; /* length of this range in bytes */
int broken; /* 0: initial value,
1: this range of NVDIMM pages are broken and offlined */
struct domain *d; /* NULL: initial value,
Not NULL: which domain this range is mapped to */
/*
* Every nvdimm_pages structure is linked in the global
* xen_nvdimm_pages_list.
*
* If it is mapped to a domain d, it will be also linked in
* d->arch.nvdimm_pages_list.
*/
struct list_head *domain_list;
struct list_head *global_list;
}
struct list_head xen_nvdimm_pages_list;
/* in asm-x86/domain.h */
struct arch_domain
{
...
struct list_head nvdimm_pages_list;
}
(1) Initially, Xen hypervisor creates a nvdimm_pages structure for each
pmem region (starting SPA and size reported by Dom0 NVDIMM driver)
and links all nvdimm_pages structures in xen_nvdimm_pages_list.
(2) If Xen hypervisor is then requested to map a range of NVDIMM pages
[start_mfn, end_mfn] to gfn of domain d, it will
(a) Check whether the GFN range [gfn, gfn + end_mfn - start_mfn + 1]
of domain d has been occupied (e.g. by normal ram, I/O or other
vNVDIMM).
(b) Search xen_nvdimm_pages_list for one or multiple nvdimm_pages
that [start_mfn, end_mfn] can fit in.
If a nvdimm_pages structure is entirely covered by [start_mfn,
end_mfn], then link that nvdimm_pages structure to
d->arch.nvdimm_pages_list.
If only a portion of a nvdimm_pages structure is covered by
[start_mfn, end_mfn], then split that nvdimm_pages structure
into multiple ones (the one entirely covered and at most two not
covered), link the covered one to d->arch.nvdimm_pages_list and
all of them to xen_nvdimm_pages_list as well.
gfn and d fields of nvdimm_pages structures linked to
d->arch.nvdimm_pages_list are also set accordingly.
(3) When a domain d is shutdown/destroyed, merge its nvdimm_pages
structures (i.e. those in d->arch.nvdimm_pages_list) in
xen_nvdimm_pages_list.
(4) When a MCE for host NVDIMM SPA range [start_mfn, end_mfn] happens,
(a) search xen_nvdimm_pages_list for affected nvdimm_pages structures,
(b) for each affected nvdimm_pages, if it belongs to a domain d and
its broken field is already set, the domain d will be shutdown to
prevent malicious guest accessing broken page (similarly to what
offline_page() does).
(c) for each affected nvdimm_pages, set its broken field to 1, and
(d) for each affected nvdimm_pages, inject to domain d a vMCE that
covers its GFN range if that nvdimm_pages belongs to domain d.
Comments, pls.
Thanks,
Haozhong
next prev parent reply other threads:[~2016-02-15 8:43 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-01 5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02 3:27 ` Tian, Kevin
2016-02-02 3:44 ` Haozhong Zhang
2016-02-02 11:09 ` Andrew Cooper
2016-02-02 6:33 ` Tian, Kevin
2016-02-02 7:39 ` Zhang, Haozhong
2016-02-02 7:48 ` Tian, Kevin
2016-02-02 7:53 ` Zhang, Haozhong
2016-02-02 8:03 ` Tian, Kevin
2016-02-02 8:49 ` Zhang, Haozhong
2016-02-02 19:01 ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03 7:00 ` Haozhong Zhang
2016-02-03 9:13 ` Jan Beulich
2016-02-03 14:09 ` Andrew Cooper
2016-02-03 14:23 ` Haozhong Zhang
2016-02-05 14:40 ` Ross Philipson
2016-02-06 1:43 ` Haozhong Zhang
2016-02-06 16:17 ` Ross Philipson
2016-02-03 12:02 ` Stefano Stabellini
2016-02-03 13:11 ` Haozhong Zhang
2016-02-03 14:20 ` Andrew Cooper
2016-02-04 3:10 ` Haozhong Zhang
2016-02-03 15:16 ` George Dunlap
2016-02-03 15:22 ` Stefano Stabellini
2016-02-03 15:35 ` Konrad Rzeszutek Wilk
2016-02-03 15:35 ` George Dunlap
2016-02-04 2:55 ` Haozhong Zhang
2016-02-04 12:24 ` Stefano Stabellini
2016-02-15 3:16 ` Zhang, Haozhong
2016-02-16 11:14 ` Stefano Stabellini
2016-02-16 12:55 ` Jan Beulich
2016-02-17 9:03 ` Haozhong Zhang
2016-03-04 7:30 ` Haozhong Zhang
2016-03-16 12:55 ` Haozhong Zhang
2016-03-16 13:13 ` Konrad Rzeszutek Wilk
2016-03-16 13:16 ` Jan Beulich
2016-03-16 13:55 ` Haozhong Zhang
2016-03-16 14:23 ` Jan Beulich
2016-03-16 14:55 ` Haozhong Zhang
2016-03-16 15:23 ` Jan Beulich
2016-03-17 8:58 ` Haozhong Zhang
2016-03-17 11:04 ` Jan Beulich
2016-03-17 12:44 ` Haozhong Zhang
2016-03-17 12:59 ` Jan Beulich
2016-03-17 13:29 ` Haozhong Zhang
2016-03-17 13:52 ` Jan Beulich
2016-03-17 14:00 ` Ian Jackson
2016-03-17 14:21 ` Haozhong Zhang
2016-03-29 8:47 ` Haozhong Zhang
2016-03-29 9:11 ` Jan Beulich
2016-03-29 10:10 ` Haozhong Zhang
2016-03-29 10:49 ` Jan Beulich
2016-04-08 5:02 ` Haozhong Zhang
2016-04-08 15:52 ` Jan Beulich
2016-04-12 8:45 ` Haozhong Zhang
2016-04-21 5:09 ` Haozhong Zhang
2016-04-21 7:04 ` Jan Beulich
2016-04-22 2:36 ` Haozhong Zhang
2016-04-22 8:24 ` Jan Beulich
2016-04-22 10:16 ` Haozhong Zhang
2016-04-22 10:53 ` Jan Beulich
2016-04-22 12:26 ` Haozhong Zhang
2016-04-22 12:36 ` Jan Beulich
2016-04-22 12:54 ` Haozhong Zhang
2016-04-22 13:22 ` Jan Beulich
2016-03-17 13:32 ` Konrad Rzeszutek Wilk
2016-02-03 15:47 ` Konrad Rzeszutek Wilk
2016-02-04 2:36 ` Haozhong Zhang
2016-02-15 9:04 ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03 8:28 ` Haozhong Zhang
2016-02-03 9:18 ` Jan Beulich
2016-02-03 12:22 ` Haozhong Zhang
2016-02-03 12:38 ` Jan Beulich
2016-02-03 12:49 ` Haozhong Zhang
2016-02-03 14:30 ` Andrew Cooper
2016-02-03 14:39 ` Jan Beulich
2016-02-15 8:43 ` Haozhong Zhang [this message]
2016-02-15 11:07 ` Jan Beulich
2016-02-17 9:01 ` Haozhong Zhang
2016-02-17 9:08 ` Jan Beulich
2016-02-18 7:42 ` Haozhong Zhang
2016-02-19 2:14 ` Konrad Rzeszutek Wilk
2016-03-01 7:39 ` Haozhong Zhang
2016-03-01 18:33 ` Ian Jackson
2016-03-01 18:49 ` Konrad Rzeszutek Wilk
2016-03-02 7:14 ` Haozhong Zhang
2016-03-02 13:03 ` Jan Beulich
2016-03-04 2:20 ` Haozhong Zhang
2016-03-08 9:15 ` Haozhong Zhang
2016-03-08 9:27 ` Jan Beulich
2016-03-09 12:22 ` Haozhong Zhang
2016-03-09 16:17 ` Jan Beulich
2016-03-10 3:27 ` Haozhong Zhang
2016-03-17 11:05 ` Ian Jackson
2016-03-17 13:37 ` Haozhong Zhang
2016-03-17 13:56 ` Jan Beulich
2016-03-17 14:22 ` Haozhong Zhang
2016-03-17 14:12 ` Xu, Quan
2016-03-17 14:22 ` Zhang, Haozhong
2016-03-07 20:53 ` Konrad Rzeszutek Wilk
2016-03-08 5:50 ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28 ` Haozhong Zhang
2016-02-24 14:00 ` Ross Philipson
2016-02-24 16:42 ` Haozhong Zhang
2016-02-24 17:50 ` Ross Philipson
2016-02-24 14:24 ` Jan Beulich
2016-02-24 15:48 ` Haozhong Zhang
2016-02-24 16:54 ` Jan Beulich
2016-02-28 14:48 ` Haozhong Zhang
2016-02-29 9:01 ` Jan Beulich
2016-02-29 9:45 ` Haozhong Zhang
2016-02-29 10:12 ` Jan Beulich
2016-02-29 11:52 ` Haozhong Zhang
2016-02-29 12:04 ` Jan Beulich
2016-02-29 12:22 ` Haozhong Zhang
2016-03-01 13:51 ` Ian Jackson
2016-03-01 15:04 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160215084352.GB8938@hz-desktop.sh.intel.com \
--to=haozhong.zhang@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).