From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Juergen Gross <JGross@suse.com>,
Kevin Tian <kevin.tian@intel.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jun Nakajima <jun.nakajima@intel.com>,
Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [RFC Design Doc v2] Add vNVDIMM support for Xen
Date: Wed, 3 Aug 2016 14:54:20 +0800 [thread overview]
Message-ID: <20160803065420.fprppr2jhhxl5ehi@hz-desktop> (raw)
In-Reply-To: <57A0CE580200007800101E09@prv-mh.provo.novell.com>
On 08/02/16 08:46, Jan Beulich wrote:
> >>> On 18.07.16 at 02:29, <haozhong.zhang@intel.com> wrote:
> > 4.2.2 Detection of Host pmem Devices
> >
> > The detection and initialize host pmem devices require a non-trivial
> > driver to interact with the corresponding ACPI namespace devices,
> > parse namespace labels and make necessary recovery actions. Instead
> > of duplicating the comprehensive Linux pmem driver in Xen hypervisor,
> > our designs leaves it to Dom0 Linux and let Dom0 Linux report
> > detected host pmem devices to Xen hypervisor.
> >
> > Our design takes following steps to detect host pmem devices when Xen
> > boots.
> > (1) As booting on bare metal, host pmem devices are detected by Dom0
> > Linux NVDIMM driver.
> >
> > (2) Our design extends Linux NVDIMM driver to reports SPA's and sizes
> > of the pmem devices and reserved areas to Xen hypervisor via a
> > new hypercall.
> >
> > (3) Xen hypervisor then checks
> > - whether SPA and size of the newly reported pmem device is overlap
> > with any previously reported pmem devices;
>
> ... or with system RAM.
>
> > - whether the reserved area can fit in the pmem device and is
> > large enough to hold page_info structs for itself.
>
> So "reserved" here means available for Xen's use, but not for more
> general purposes? How would the area Linux uses for its own
> purposes get represented?
>
Reserved for xen only. I was going to reuse the existing reservation
mechanism in linux pmem driver to allow reserving two areas - one for
xen and another for linux itself. However, I later realized the
existing mechanism depends on huge page support, so it does not work
in dom0. For the first implementation, I'm implementing in a different
way to reserve only for xen, and letting dom0 linux put page struct
for pmem in the normal ram. Afterwards, I'll look for a way to allow
both.
> > (4) Because the reserved area is now used by Xen hypervisor, it
> > should not be accessible by Dom0 any more. Therefore, if a host
> > pmem device is recorded by Xen hypervisor, Xen will unmap its
> > reserved area from Dom0. Our design also needs to extend Linux
> > NVDIMM driver to "balloon out" the reserved area after it
> > successfully reports a pmem device to Xen hypervisor.
>
> ... "balloon out" ... _after_? That'd be unsafe.
>
Before ballooning is accomplished, the pmem driver does not create any
device node under /dev/ and hence no one except the pmem drive can
access the reserved area on pmem, so I think it's okey to balloon
after reporting.
> > 4.2.3 Get Host Machine Address (SPA) of Host pmem Files
> >
> > Before a pmem file is assigned to a domain, we need to know the host
> > SPA ranges that are allocated to this file. We do this work in xl.
> >
> > If a pmem device /dev/pmem0 is given, xl will read
> > /sys/block/pmem0/device/{resource,size} respectively for the start
> > SPA and size of the pmem device.
> >
> > If a pre-allocated file /mnt/dax/file is given,
> > (1) xl first finds the host pmem device where /mnt/dax/file is. Then
> > it uses the method above to get the start SPA of the host pmem
> > device.
> > (2) xl then uses fiemap ioctl to get the extend mappings of
> > /mnt/dax/file, and adds the corresponding physical offsets and
> > lengths in each mapping entries to above start SPA to get the SPA
> > ranges pre-allocated for this file.
>
> Remind me again: These extents never change, not even across
> reboot? I think this would be good to be written down here explicitly.
Yes
> Hadn't there been talk of using labels to be able to allow a guest to
> own the exact same physical range again after reboot or guest or
> host?
>
You mean labels in NVDIMM label storage area? As defined in Intel
NVDIMM Namespace Specification, labels are used to specify
namespaces. For a pmem interleave set (possible cross several dimms),
at most one pmem namespace (and hence at most one label) is
allowed. Therefore, labels can not be used to partition pmem.
> > 3) When hvmloader loads a type 0 entry, it extracts the signature
> > from the data blob and search for it in builtin_table_sigs[]. If
> > found anyone, hvmloader will report an error and stop. Otherwise,
> > it will append it to the end of loaded guest ACPI.
>
> Duplicate table names aren't generally collisions: There can, for
> example, be many tables named "SSDT".
>
I'll exclude SSDT from the duplication check.
> > 4) When hvmloader loads a type 1 entry, it extracts the device name
> > from the data blob and search for it in builtin_nd_names[]. If
> > found anyone, hvmloader will report and error and stop. Otherwise,
> > it will wrap the AML code snippet by "Device (name[4]) {...}" and
> > include it in a new SSDT which is then appended to the end of
> > loaded guest ACPI.
>
> But all of these could go into a single SSDT, instead of (as it sounds)
> each into its own one?
>
Yes, I meant to put them in one SSDT.
Thanks,
Haozhong
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-08-03 6:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-18 0:29 [RFC Design Doc v2] Add vNVDIMM support for Xen Haozhong Zhang
2016-07-18 8:36 ` Tian, Kevin
2016-07-18 9:01 ` Zhang, Haozhong
2016-07-19 0:58 ` Tian, Kevin
2016-07-19 2:10 ` Zhang, Haozhong
2016-07-19 1:57 ` Bob Liu
2016-07-19 2:40 ` Haozhong Zhang
2016-08-02 14:46 ` Jan Beulich
2016-08-03 6:54 ` Haozhong Zhang [this message]
2016-08-03 8:45 ` Jan Beulich
2016-08-03 9:37 ` Haozhong Zhang
2016-08-03 9:47 ` Jan Beulich
2016-08-03 10:08 ` Haozhong Zhang
2016-08-03 10:18 ` Jan Beulich
2016-08-03 21:25 ` Konrad Rzeszutek Wilk
2016-08-03 23:16 ` Konrad Rzeszutek Wilk
2016-08-04 1:51 ` Haozhong Zhang
2016-08-04 8:52 ` Haozhong Zhang
2016-08-04 9:25 ` Jan Beulich
2016-08-04 9:35 ` Haozhong Zhang
2016-08-04 14:51 ` Konrad Rzeszutek Wilk
2016-08-04 14:51 ` Konrad Rzeszutek Wilk
2016-08-05 6:25 ` Haozhong Zhang
2016-08-05 13:29 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160803065420.fprppr2jhhxl5ehi@hz-desktop \
--to=haozhong.zhang@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=JGross@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=sstabellini@kernel.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).