All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Juergen Gross <JGross@suse.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	IanJackson <ian.jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Thu, 17 Mar 2016 16:58:33 +0800	[thread overview]
Message-ID: <20160317085833.GC4005@hz-desktop.sh.intel.com> (raw)
In-Reply-To: <56E9888802000078000DD3F9@prv-mh.provo.novell.com>

On 03/16/16 09:23, Jan Beulich wrote:
> >>> On 16.03.16 at 15:55, <haozhong.zhang@intel.com> wrote:
> > On 03/16/16 08:23, Jan Beulich wrote:
> >> >>> On 16.03.16 at 14:55, <haozhong.zhang@intel.com> wrote:
> >> > On 03/16/16 07:16, Jan Beulich wrote:
> >> >> Which reminds me: When considering a file on NVDIMM, how
> >> >> are you making sure the mapping of the file to disk (i.e.
> >> >> memory) blocks doesn't change while the guest has access
> >> >> to it, e.g. due to some defragmentation going on?
> >> > 
> >> > The current linux kernel 4.5 has an experimental "raw device dax
> >> > support" (enabled by removing "depends on BROKEN" from "config
> >> > BLK_DEV_DAX") which can guarantee the consistent mapping. The driver
> >> > developers are going to make it non-broken in linux kernel 4.6.
> >> 
> >> But there you talk about full devices, whereas my question was
> >> for files.
> >>
> > 
> > the raw device dax support is for files on NVDIMM.
> 
> Okay, I can only trust you here. I thought FS_DAX is the file level
> thing.
> 
> >> >> And
> >> >> talking of fragmentation - how do you mean to track guest
> >> >> permissions for an unbounded number of address ranges?
> >> >>
> >> > 
> >> > In this case range structs in iomem_caps for NVDIMMs may consume a lot
> >> > of memory, so I think they are another candidate that should be put in
> >> > the reserved area on NVDIMM. If we only allow to grant access
> >> > permissions to NVDIMM page by page (rather than byte), the number of
> >> > range structs for each NVDIMM in the worst case is still decidable.
> >> 
> >> Of course the permission granularity is going to by pages, not
> >> bytes (or else we couldn't allow the pages to be mapped into
> >> guest address space). And the limit on the per-domain range
> >> sets isn't going to be allowed to be bumped significantly, at
> >> least not for any of the existing ones (or else you'd have to
> >> prove such bumping can't be abused).
> > 
> > What is that limit? the total number of range structs in per-domain
> > range sets? I must miss something when looking through 'case
> > XEN_DOMCTL_iomem_permission' of do_domctl() and didn't find that
> > limit, unless it means alloc_range() will fail when there are lots of
> > range structs.
> 
> Oh, I'm sorry, that was a different set of range sets I was
> thinking about. But note that excessive creation of ranges
> through XEN_DOMCTL_iomem_permission is not a security issue
> just because of XSA-77, i.e. we'd still not knowingly allow a
> severe increase here.
>

I didn't notice that multiple domains can all have access permission
to an iomem range, i.e. there can be multiple range structs for a
single iomem range. If range structs for NVDIMM are put on NVDIMM,
then there would be still a huge amount of them on NVDIMM in the worst
case (maximum number of domains * number of NVDIMM pages).

A workaround is to only allow a range of NVDIMM pages be accessed by a
single domain. Whenever we add the access permission of NVDIMM pages
to a domain, we also remove the permission from its current
grantee. In this way, we only need to put 'number of NVDIMM pages'
range structs on NVDIMM in the worst case.

> >> Putting such control
> >> structures on NVDIMM is a nice idea, but following our isolation
> >> model for normal memory, any such memory used by Xen
> >> would then need to be (made) inaccessible to Dom0.
> > 
> > I'm not clear how this is done. By marking those inaccessible pages as
> > unpresent in dom0's page table? Or any example I can follow?
> 
> That's the problem - so far we had no need to do so since Dom0
> was only ever allowed access to memory Xen didn't use for itself
> or knows it wants to share. Whereas now you want such a
> resource controlled first by Dom0, and only then handed to Xen.
> So yes, Dom0 would need to zap any mappings of these pages
> (and Xen would need to verify that, which would come mostly
> without new code as long as struct page_info gets properly
> used for all this memory) before Xen could use it. Much like
> ballooning out a normal RAM page.
> 

Thanks, I'll look into this balloon approach.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-03-17  8:58 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02  3:27   ` Tian, Kevin
2016-02-02  3:44   ` Haozhong Zhang
2016-02-02 11:09     ` Andrew Cooper
2016-02-02  6:33 ` Tian, Kevin
2016-02-02  7:39   ` Zhang, Haozhong
2016-02-02  7:48     ` Tian, Kevin
2016-02-02  7:53       ` Zhang, Haozhong
2016-02-02  8:03         ` Tian, Kevin
2016-02-02  8:49           ` Zhang, Haozhong
2016-02-02 19:01   ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03  7:00   ` Haozhong Zhang
2016-02-03  9:13     ` Jan Beulich
2016-02-03 14:09       ` Andrew Cooper
2016-02-03 14:23         ` Haozhong Zhang
2016-02-05 14:40         ` Ross Philipson
2016-02-06  1:43           ` Haozhong Zhang
2016-02-06 16:17             ` Ross Philipson
2016-02-03 12:02     ` Stefano Stabellini
2016-02-03 13:11       ` Haozhong Zhang
2016-02-03 14:20         ` Andrew Cooper
2016-02-04  3:10           ` Haozhong Zhang
2016-02-03 15:16       ` George Dunlap
2016-02-03 15:22         ` Stefano Stabellini
2016-02-03 15:35           ` Konrad Rzeszutek Wilk
2016-02-03 15:35           ` George Dunlap
2016-02-04  2:55           ` Haozhong Zhang
2016-02-04 12:24             ` Stefano Stabellini
2016-02-15  3:16               ` Zhang, Haozhong
2016-02-16 11:14                 ` Stefano Stabellini
2016-02-16 12:55                   ` Jan Beulich
2016-02-17  9:03                     ` Haozhong Zhang
2016-03-04  7:30                     ` Haozhong Zhang
2016-03-16 12:55                       ` Haozhong Zhang
2016-03-16 13:13                         ` Konrad Rzeszutek Wilk
2016-03-16 13:16                         ` Jan Beulich
2016-03-16 13:55                           ` Haozhong Zhang
2016-03-16 14:23                             ` Jan Beulich
2016-03-16 14:55                               ` Haozhong Zhang
2016-03-16 15:23                                 ` Jan Beulich
2016-03-17  8:58                                   ` Haozhong Zhang [this message]
2016-03-17 11:04                                     ` Jan Beulich
2016-03-17 12:44                                       ` Haozhong Zhang
2016-03-17 12:59                                         ` Jan Beulich
2016-03-17 13:29                                           ` Haozhong Zhang
2016-03-17 13:52                                             ` Jan Beulich
2016-03-17 14:00                                             ` Ian Jackson
2016-03-17 14:21                                               ` Haozhong Zhang
2016-03-29  8:47                                                 ` Haozhong Zhang
2016-03-29  9:11                                                   ` Jan Beulich
2016-03-29 10:10                                                     ` Haozhong Zhang
2016-03-29 10:49                                                       ` Jan Beulich
2016-04-08  5:02                                                         ` Haozhong Zhang
2016-04-08 15:52                                                           ` Jan Beulich
2016-04-12  8:45                                                             ` Haozhong Zhang
2016-04-21  5:09                                                               ` Haozhong Zhang
2016-04-21  7:04                                                                 ` Jan Beulich
2016-04-22  2:36                                                                   ` Haozhong Zhang
2016-04-22  8:24                                                                     ` Jan Beulich
2016-04-22 10:16                                                                       ` Haozhong Zhang
2016-04-22 10:53                                                                         ` Jan Beulich
2016-04-22 12:26                                                                           ` Haozhong Zhang
2016-04-22 12:36                                                                             ` Jan Beulich
2016-04-22 12:54                                                                               ` Haozhong Zhang
2016-04-22 13:22                                                                                 ` Jan Beulich
2016-03-17 13:32                                         ` Konrad Rzeszutek Wilk
2016-02-03 15:47       ` Konrad Rzeszutek Wilk
2016-02-04  2:36         ` Haozhong Zhang
2016-02-15  9:04         ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03  8:28   ` Haozhong Zhang
2016-02-03  9:18     ` Jan Beulich
2016-02-03 12:22       ` Haozhong Zhang
2016-02-03 12:38         ` Jan Beulich
2016-02-03 12:49           ` Haozhong Zhang
2016-02-03 14:30       ` Andrew Cooper
2016-02-03 14:39         ` Jan Beulich
2016-02-15  8:43   ` Haozhong Zhang
2016-02-15 11:07     ` Jan Beulich
2016-02-17  9:01       ` Haozhong Zhang
2016-02-17  9:08         ` Jan Beulich
2016-02-18  7:42           ` Haozhong Zhang
2016-02-19  2:14             ` Konrad Rzeszutek Wilk
2016-03-01  7:39               ` Haozhong Zhang
2016-03-01 18:33                 ` Ian Jackson
2016-03-01 18:49                   ` Konrad Rzeszutek Wilk
2016-03-02  7:14                     ` Haozhong Zhang
2016-03-02 13:03                       ` Jan Beulich
2016-03-04  2:20                         ` Haozhong Zhang
2016-03-08  9:15                           ` Haozhong Zhang
2016-03-08  9:27                             ` Jan Beulich
2016-03-09 12:22                               ` Haozhong Zhang
2016-03-09 16:17                                 ` Jan Beulich
2016-03-10  3:27                                   ` Haozhong Zhang
2016-03-17 11:05                                   ` Ian Jackson
2016-03-17 13:37                                     ` Haozhong Zhang
2016-03-17 13:56                                       ` Jan Beulich
2016-03-17 14:22                                         ` Haozhong Zhang
2016-03-17 14:12                                       ` Xu, Quan
2016-03-17 14:22                                         ` Zhang, Haozhong
2016-03-07 20:53                       ` Konrad Rzeszutek Wilk
2016-03-08  5:50                         ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28   ` Haozhong Zhang
2016-02-24 14:00     ` Ross Philipson
2016-02-24 16:42       ` Haozhong Zhang
2016-02-24 17:50         ` Ross Philipson
2016-02-24 14:24     ` Jan Beulich
2016-02-24 15:48       ` Haozhong Zhang
2016-02-24 16:54         ` Jan Beulich
2016-02-28 14:48           ` Haozhong Zhang
2016-02-29  9:01             ` Jan Beulich
2016-02-29  9:45               ` Haozhong Zhang
2016-02-29 10:12                 ` Jan Beulich
2016-02-29 11:52                   ` Haozhong Zhang
2016-02-29 12:04                     ` Jan Beulich
2016-02-29 12:22                       ` Haozhong Zhang
2016-03-01 13:51                         ` Ian Jackson
2016-03-01 15:04                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160317085833.GC4005@hz-desktop.sh.intel.com \
    --to=haozhong.zhang@intel.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.