xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <jbeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Juergen Gross <jgross@suse.com>, Wei Liu <wei.liu2@citrix.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [RFC Design Doc v2] Add vNVDIMM support for Xen
Date: Tue, 19 Jul 2016 09:57:37 +0800	[thread overview]
Message-ID: <578D8911.7070503@oracle.com> (raw)
In-Reply-To: <20160718002912.rva5n5jbrezdchwx@hz-desktop>

Hey Haozhong,

On 07/18/2016 08:29 AM, Haozhong Zhang wrote:
> Hi,
> 
> Following is version 2 of the design doc for supporting vNVDIMM in

This version is really good, very clear and included almost everything I'd like to know.

> Xen. It's basically the summary of discussion on previous v1 design
> (https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00006.html).
> Any comments are welcome. The corresponding patches are WIP.
> 

So are you(or Intel) going to write all the patches? Is there any task the community to take part in?

[..snip..]
> 3. Usage Example of vNVDIMM in Xen
> 
>  Our design is to provide virtual pmem devices to HVM domains. The
>  virtual pmem devices are backed by host pmem devices.
> 
>  Dom0 Linux kernel can detect the host pmem devices and create
>  /dev/pmemXX for each detected devices. Users in Dom0 can then create
>  DAX file system on /dev/pmemXX and create several pre-allocate files
>  in the DAX file system.
> 
>  After setup the file system on the host pmem, users can add the
>  following lines in the xl configuration files to assign the host pmem
>  regions to domains:
>      vnvdimm = [ 'file=/dev/pmem0' ]
>  or
>      vnvdimm = [ 'file=/mnt/dax/pre_allocated_file' ]
> 

Could you please also consider the case when driver domain gets involved?
E.g vnvdimm = [ 'file=/dev/pmem0', backend='xxx' ]?

>   The first type of configuration assigns the entire pmem device
>   (/dev/pmem0) to the domain, while the second assigns the space
>   allocated to /mnt/dax/pre_allocated_file on the host pmem device to
>   the domain.
> 
..[snip..]
> 
> 4.2.2 Detection of Host pmem Devices
> 
>  The detection and initialize host pmem devices require a non-trivial
>  driver to interact with the corresponding ACPI namespace devices,
>  parse namespace labels and make necessary recovery actions. Instead
>  of duplicating the comprehensive Linux pmem driver in Xen hypervisor,
>  our designs leaves it to Dom0 Linux and let Dom0 Linux report
>  detected host pmem devices to Xen hypervisor.
> 
>  Our design takes following steps to detect host pmem devices when Xen
>  boots.
>  (1) As booting on bare metal, host pmem devices are detected by Dom0
>      Linux NVDIMM driver.
> 
>  (2) Our design extends Linux NVDIMM driver to reports SPA's and sizes
>      of the pmem devices and reserved areas to Xen hypervisor via a
>      new hypercall.
> 
>  (3) Xen hypervisor then checks
>      - whether SPA and size of the newly reported pmem device is overlap
>        with any previously reported pmem devices;
>      - whether the reserved area can fit in the pmem device and is
>        large enough to hold page_info structs for itself.
> 
>      If any checks fail, the reported pmem device will be ignored by
>      Xen hypervisor and hence will not be used by any
>      guests. Otherwise, Xen hypervisor will recorded the reported
>      parameters and create page_info structs in the reserved area.
> 
>  (4) Because the reserved area is now used by Xen hypervisor, it
>      should not be accessible by Dom0 any more. Therefore, if a host
>      pmem device is recorded by Xen hypervisor, Xen will unmap its
>      reserved area from Dom0. Our design also needs to extend Linux
>      NVDIMM driver to "balloon out" the reserved area after it
>      successfully reports a pmem device to Xen hypervisor.
> 
> 4.2.3 Get Host Machine Address (SPA) of Host pmem Files
> 
>  Before a pmem file is assigned to a domain, we need to know the host
>  SPA ranges that are allocated to this file. We do this work in xl.
> 
>  If a pmem device /dev/pmem0 is given, xl will read
>  /sys/block/pmem0/device/{resource,size} respectively for the start
>  SPA and size of the pmem device.
> 
>  If a pre-allocated file /mnt/dax/file is given,
>  (1) xl first finds the host pmem device where /mnt/dax/file is. Then
>      it uses the method above to get the start SPA of the host pmem
>      device.
>  (2) xl then uses fiemap ioctl to get the extend mappings of
>      /mnt/dax/file, and adds the corresponding physical offsets and
>      lengths in each mapping entries to above start SPA to get the SPA
>      ranges pre-allocated for this file.
> 

Looks like PMEM can't be passed through to driver domain directly like e.g PCI devices.

So if created a driver domain by: vnvdimm = [ 'file=/dev/pmem0' ], and make a DAX file system on the driver domain.

Then creating new guests with vnvdimm = [ 'file=dax file in driver domain', backend = 'driver domain' ].
Is this going to work? In my understanding, fiemap can only get the GPFN instead of the really SPA of PMEM in this case.


>  The resulting host SPA ranges will be passed to QEMU which allocates
>  guest address space for vNVDIMM devices and calls Xen hypervisor to
>  map the guest address to the host SPA ranges.
> 

Can Dom0 still access the same SPA range when Xen decides to assign it to new domU?
I assume the range will be unmapped automatically from dom0 in the hypercall?

Thanks,
-Bob

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-07-19  1:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18  0:29 [RFC Design Doc v2] Add vNVDIMM support for Xen Haozhong Zhang
2016-07-18  8:36 ` Tian, Kevin
2016-07-18  9:01   ` Zhang, Haozhong
2016-07-19  0:58     ` Tian, Kevin
2016-07-19  2:10       ` Zhang, Haozhong
2016-07-19  1:57 ` Bob Liu [this message]
2016-07-19  2:40   ` Haozhong Zhang
2016-08-02 14:46 ` Jan Beulich
2016-08-03  6:54   ` Haozhong Zhang
2016-08-03  8:45     ` Jan Beulich
2016-08-03  9:37       ` Haozhong Zhang
2016-08-03  9:47         ` Jan Beulich
2016-08-03 10:08           ` Haozhong Zhang
2016-08-03 10:18             ` Jan Beulich
2016-08-03 21:25 ` Konrad Rzeszutek Wilk
2016-08-03 23:16   ` Konrad Rzeszutek Wilk
2016-08-04  1:51     ` Haozhong Zhang
2016-08-04  8:52   ` Haozhong Zhang
2016-08-04  9:25     ` Jan Beulich
2016-08-04  9:35       ` Haozhong Zhang
2016-08-04 14:51         ` Konrad Rzeszutek Wilk
2016-08-04 14:51     ` Konrad Rzeszutek Wilk
2016-08-05  6:25       ` Haozhong Zhang
2016-08-05 13:29         ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=578D8911.7070503@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=sstabellini@kernel.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).