From: Bob Liu <bob.liu@oracle.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <jbeulich@suse.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Juergen Gross <jgross@suse.com>, Wei Liu <wei.liu2@citrix.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Xiao Guangrong <guangrong.xiao@linux.intel.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [RFC Design Doc v2] Add vNVDIMM support for Xen
Date: Tue, 19 Jul 2016 09:57:37 +0800 [thread overview]
Message-ID: <578D8911.7070503@oracle.com> (raw)
In-Reply-To: <20160718002912.rva5n5jbrezdchwx@hz-desktop>
Hey Haozhong,
On 07/18/2016 08:29 AM, Haozhong Zhang wrote:
> Hi,
>
> Following is version 2 of the design doc for supporting vNVDIMM in
This version is really good, very clear and included almost everything I'd like to know.
> Xen. It's basically the summary of discussion on previous v1 design
> (https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00006.html).
> Any comments are welcome. The corresponding patches are WIP.
>
So are you(or Intel) going to write all the patches? Is there any task the community to take part in?
[..snip..]
> 3. Usage Example of vNVDIMM in Xen
>
> Our design is to provide virtual pmem devices to HVM domains. The
> virtual pmem devices are backed by host pmem devices.
>
> Dom0 Linux kernel can detect the host pmem devices and create
> /dev/pmemXX for each detected devices. Users in Dom0 can then create
> DAX file system on /dev/pmemXX and create several pre-allocate files
> in the DAX file system.
>
> After setup the file system on the host pmem, users can add the
> following lines in the xl configuration files to assign the host pmem
> regions to domains:
> vnvdimm = [ 'file=/dev/pmem0' ]
> or
> vnvdimm = [ 'file=/mnt/dax/pre_allocated_file' ]
>
Could you please also consider the case when driver domain gets involved?
E.g vnvdimm = [ 'file=/dev/pmem0', backend='xxx' ]?
> The first type of configuration assigns the entire pmem device
> (/dev/pmem0) to the domain, while the second assigns the space
> allocated to /mnt/dax/pre_allocated_file on the host pmem device to
> the domain.
>
..[snip..]
>
> 4.2.2 Detection of Host pmem Devices
>
> The detection and initialize host pmem devices require a non-trivial
> driver to interact with the corresponding ACPI namespace devices,
> parse namespace labels and make necessary recovery actions. Instead
> of duplicating the comprehensive Linux pmem driver in Xen hypervisor,
> our designs leaves it to Dom0 Linux and let Dom0 Linux report
> detected host pmem devices to Xen hypervisor.
>
> Our design takes following steps to detect host pmem devices when Xen
> boots.
> (1) As booting on bare metal, host pmem devices are detected by Dom0
> Linux NVDIMM driver.
>
> (2) Our design extends Linux NVDIMM driver to reports SPA's and sizes
> of the pmem devices and reserved areas to Xen hypervisor via a
> new hypercall.
>
> (3) Xen hypervisor then checks
> - whether SPA and size of the newly reported pmem device is overlap
> with any previously reported pmem devices;
> - whether the reserved area can fit in the pmem device and is
> large enough to hold page_info structs for itself.
>
> If any checks fail, the reported pmem device will be ignored by
> Xen hypervisor and hence will not be used by any
> guests. Otherwise, Xen hypervisor will recorded the reported
> parameters and create page_info structs in the reserved area.
>
> (4) Because the reserved area is now used by Xen hypervisor, it
> should not be accessible by Dom0 any more. Therefore, if a host
> pmem device is recorded by Xen hypervisor, Xen will unmap its
> reserved area from Dom0. Our design also needs to extend Linux
> NVDIMM driver to "balloon out" the reserved area after it
> successfully reports a pmem device to Xen hypervisor.
>
> 4.2.3 Get Host Machine Address (SPA) of Host pmem Files
>
> Before a pmem file is assigned to a domain, we need to know the host
> SPA ranges that are allocated to this file. We do this work in xl.
>
> If a pmem device /dev/pmem0 is given, xl will read
> /sys/block/pmem0/device/{resource,size} respectively for the start
> SPA and size of the pmem device.
>
> If a pre-allocated file /mnt/dax/file is given,
> (1) xl first finds the host pmem device where /mnt/dax/file is. Then
> it uses the method above to get the start SPA of the host pmem
> device.
> (2) xl then uses fiemap ioctl to get the extend mappings of
> /mnt/dax/file, and adds the corresponding physical offsets and
> lengths in each mapping entries to above start SPA to get the SPA
> ranges pre-allocated for this file.
>
Looks like PMEM can't be passed through to driver domain directly like e.g PCI devices.
So if created a driver domain by: vnvdimm = [ 'file=/dev/pmem0' ], and make a DAX file system on the driver domain.
Then creating new guests with vnvdimm = [ 'file=dax file in driver domain', backend = 'driver domain' ].
Is this going to work? In my understanding, fiemap can only get the GPFN instead of the really SPA of PMEM in this case.
> The resulting host SPA ranges will be passed to QEMU which allocates
> guest address space for vNVDIMM devices and calls Xen hypervisor to
> map the guest address to the host SPA ranges.
>
Can Dom0 still access the same SPA range when Xen decides to assign it to new domU?
I assume the range will be unmapped automatically from dom0 in the hypercall?
Thanks,
-Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-07-19 1:57 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-18 0:29 [RFC Design Doc v2] Add vNVDIMM support for Xen Haozhong Zhang
2016-07-18 8:36 ` Tian, Kevin
2016-07-18 9:01 ` Zhang, Haozhong
2016-07-19 0:58 ` Tian, Kevin
2016-07-19 2:10 ` Zhang, Haozhong
2016-07-19 1:57 ` Bob Liu [this message]
2016-07-19 2:40 ` Haozhong Zhang
2016-08-02 14:46 ` Jan Beulich
2016-08-03 6:54 ` Haozhong Zhang
2016-08-03 8:45 ` Jan Beulich
2016-08-03 9:37 ` Haozhong Zhang
2016-08-03 9:47 ` Jan Beulich
2016-08-03 10:08 ` Haozhong Zhang
2016-08-03 10:18 ` Jan Beulich
2016-08-03 21:25 ` Konrad Rzeszutek Wilk
2016-08-03 23:16 ` Konrad Rzeszutek Wilk
2016-08-04 1:51 ` Haozhong Zhang
2016-08-04 8:52 ` Haozhong Zhang
2016-08-04 9:25 ` Jan Beulich
2016-08-04 9:35 ` Haozhong Zhang
2016-08-04 14:51 ` Konrad Rzeszutek Wilk
2016-08-04 14:51 ` Konrad Rzeszutek Wilk
2016-08-05 6:25 ` Haozhong Zhang
2016-08-05 13:29 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=578D8911.7070503@oracle.com \
--to=bob.liu@oracle.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=sstabellini@kernel.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).