All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Juergen Gross <JGross@suse.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	IanJackson <ian.jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Thu, 17 Mar 2016 06:59:25 -0600	[thread overview]
Message-ID: <56EAB83D02000078000DDDDD@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20160317124428.GA9842@hz-desktop.sh.intel.com>

>>> On 17.03.16 at 13:44, <haozhong.zhang@intel.com> wrote:
> On 03/17/16 05:04, Jan Beulich wrote:
>> >>> On 17.03.16 at 09:58, <haozhong.zhang@intel.com> wrote:
>> > On 03/16/16 09:23, Jan Beulich wrote:
>> >> >>> On 16.03.16 at 15:55, <haozhong.zhang@intel.com> wrote:
>> >> > On 03/16/16 08:23, Jan Beulich wrote:
>> >> >> >>> On 16.03.16 at 14:55, <haozhong.zhang@intel.com> wrote:
>> >> >> > On 03/16/16 07:16, Jan Beulich wrote:
>> >> >> >> And
>> >> >> >> talking of fragmentation - how do you mean to track guest
>> >> >> >> permissions for an unbounded number of address ranges?
>> >> >> >>
>> >> >> > 
>> >> >> > In this case range structs in iomem_caps for NVDIMMs may consume a lot
>> >> >> > of memory, so I think they are another candidate that should be put in
>> >> >> > the reserved area on NVDIMM. If we only allow to grant access
>> >> >> > permissions to NVDIMM page by page (rather than byte), the number of
>> >> >> > range structs for each NVDIMM in the worst case is still decidable.
>> >> >> 
>> >> >> Of course the permission granularity is going to by pages, not
>> >> >> bytes (or else we couldn't allow the pages to be mapped into
>> >> >> guest address space). And the limit on the per-domain range
>> >> >> sets isn't going to be allowed to be bumped significantly, at
>> >> >> least not for any of the existing ones (or else you'd have to
>> >> >> prove such bumping can't be abused).
>> >> > 
>> >> > What is that limit? the total number of range structs in per-domain
>> >> > range sets? I must miss something when looking through 'case
>> >> > XEN_DOMCTL_iomem_permission' of do_domctl() and didn't find that
>> >> > limit, unless it means alloc_range() will fail when there are lots of
>> >> > range structs.
>> >> 
>> >> Oh, I'm sorry, that was a different set of range sets I was
>> >> thinking about. But note that excessive creation of ranges
>> >> through XEN_DOMCTL_iomem_permission is not a security issue
>> >> just because of XSA-77, i.e. we'd still not knowingly allow a
>> >> severe increase here.
>> >>
>> > 
>> > I didn't notice that multiple domains can all have access permission
>> > to an iomem range, i.e. there can be multiple range structs for a
>> > single iomem range. If range structs for NVDIMM are put on NVDIMM,
>> > then there would be still a huge amount of them on NVDIMM in the worst
>> > case (maximum number of domains * number of NVDIMM pages).
>> > 
>> > A workaround is to only allow a range of NVDIMM pages be accessed by a
>> > single domain. Whenever we add the access permission of NVDIMM pages
>> > to a domain, we also remove the permission from its current
>> > grantee. In this way, we only need to put 'number of NVDIMM pages'
>> > range structs on NVDIMM in the worst case.
>> 
>> But will this work? There's a reason multiple domains are permitted
>> access: The domain running qemu for the guest, for example,
>> needs to be able to access guest memory.
>>
> 
> QEMU now only maintains ACPI tables and emulates _DSM for vNVDIMM
> which both do not need to access NVDIMM pages mapped to guest.

For one - this was only an example. And then - iirc qemu keeps
mappings of certain guest RAM ranges. If I'm remembering this
right, then why would it be excluded that it also may need
mappings of guest NVDIMM?

>> No matter how much you and others are opposed to this, I can't
>> help myself thinking that PMEM regions should be treated like RAM
>> (and hence be under full control of Xen), whereas PBLK regions
>> could indeed be treated like MMIO (and hence partly be under the
>> control of Dom0).
>>
> 
> Hmm, making Xen has full control could at least make reserving space
> on NVDIMM easier. I guess full control does not include manipulating
> file systems on NVDIMM which can be still left to dom0?
> 
> Then there is another problem (which also exists in the current
> design): does Xen need to emulate NVDIMM _DSM for dom0? Take the _DSM
> that access label storage area (for namespace) for example:
> 
> The way Linux reserving space on pmem mode NVDIMM is to leave the
> reserved space at the beginning of pmem mode NVDIMM and create a pmem
> namespace which starts from the end of the reserved space. Because the
> reservation information is written in the namespace in the NVDIMM
> label storage area, every OS that follows the namespace spec would not
> mistakenly write files in the reserved area. I prefer to the same way
> if Xen is going to do the reservation. We definitely don't want dom0
> to break the label storage area, so Xen seemingly needs to emulate the
> corresponding _DSM functions for dom0? If so, which part, the
> hypervisor or the toolstack, should do the emulation?

I don't think I can answer all but the very last point: Of course this
can't be done in the tool stack, since afaict the Dom0 kernel will
want to evaluate _DSM before the tool stack even runs.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-03-17 12:59 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02  3:27   ` Tian, Kevin
2016-02-02  3:44   ` Haozhong Zhang
2016-02-02 11:09     ` Andrew Cooper
2016-02-02  6:33 ` Tian, Kevin
2016-02-02  7:39   ` Zhang, Haozhong
2016-02-02  7:48     ` Tian, Kevin
2016-02-02  7:53       ` Zhang, Haozhong
2016-02-02  8:03         ` Tian, Kevin
2016-02-02  8:49           ` Zhang, Haozhong
2016-02-02 19:01   ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03  7:00   ` Haozhong Zhang
2016-02-03  9:13     ` Jan Beulich
2016-02-03 14:09       ` Andrew Cooper
2016-02-03 14:23         ` Haozhong Zhang
2016-02-05 14:40         ` Ross Philipson
2016-02-06  1:43           ` Haozhong Zhang
2016-02-06 16:17             ` Ross Philipson
2016-02-03 12:02     ` Stefano Stabellini
2016-02-03 13:11       ` Haozhong Zhang
2016-02-03 14:20         ` Andrew Cooper
2016-02-04  3:10           ` Haozhong Zhang
2016-02-03 15:16       ` George Dunlap
2016-02-03 15:22         ` Stefano Stabellini
2016-02-03 15:35           ` Konrad Rzeszutek Wilk
2016-02-03 15:35           ` George Dunlap
2016-02-04  2:55           ` Haozhong Zhang
2016-02-04 12:24             ` Stefano Stabellini
2016-02-15  3:16               ` Zhang, Haozhong
2016-02-16 11:14                 ` Stefano Stabellini
2016-02-16 12:55                   ` Jan Beulich
2016-02-17  9:03                     ` Haozhong Zhang
2016-03-04  7:30                     ` Haozhong Zhang
2016-03-16 12:55                       ` Haozhong Zhang
2016-03-16 13:13                         ` Konrad Rzeszutek Wilk
2016-03-16 13:16                         ` Jan Beulich
2016-03-16 13:55                           ` Haozhong Zhang
2016-03-16 14:23                             ` Jan Beulich
2016-03-16 14:55                               ` Haozhong Zhang
2016-03-16 15:23                                 ` Jan Beulich
2016-03-17  8:58                                   ` Haozhong Zhang
2016-03-17 11:04                                     ` Jan Beulich
2016-03-17 12:44                                       ` Haozhong Zhang
2016-03-17 12:59                                         ` Jan Beulich [this message]
2016-03-17 13:29                                           ` Haozhong Zhang
2016-03-17 13:52                                             ` Jan Beulich
2016-03-17 14:00                                             ` Ian Jackson
2016-03-17 14:21                                               ` Haozhong Zhang
2016-03-29  8:47                                                 ` Haozhong Zhang
2016-03-29  9:11                                                   ` Jan Beulich
2016-03-29 10:10                                                     ` Haozhong Zhang
2016-03-29 10:49                                                       ` Jan Beulich
2016-04-08  5:02                                                         ` Haozhong Zhang
2016-04-08 15:52                                                           ` Jan Beulich
2016-04-12  8:45                                                             ` Haozhong Zhang
2016-04-21  5:09                                                               ` Haozhong Zhang
2016-04-21  7:04                                                                 ` Jan Beulich
2016-04-22  2:36                                                                   ` Haozhong Zhang
2016-04-22  8:24                                                                     ` Jan Beulich
2016-04-22 10:16                                                                       ` Haozhong Zhang
2016-04-22 10:53                                                                         ` Jan Beulich
2016-04-22 12:26                                                                           ` Haozhong Zhang
2016-04-22 12:36                                                                             ` Jan Beulich
2016-04-22 12:54                                                                               ` Haozhong Zhang
2016-04-22 13:22                                                                                 ` Jan Beulich
2016-03-17 13:32                                         ` Konrad Rzeszutek Wilk
2016-02-03 15:47       ` Konrad Rzeszutek Wilk
2016-02-04  2:36         ` Haozhong Zhang
2016-02-15  9:04         ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03  8:28   ` Haozhong Zhang
2016-02-03  9:18     ` Jan Beulich
2016-02-03 12:22       ` Haozhong Zhang
2016-02-03 12:38         ` Jan Beulich
2016-02-03 12:49           ` Haozhong Zhang
2016-02-03 14:30       ` Andrew Cooper
2016-02-03 14:39         ` Jan Beulich
2016-02-15  8:43   ` Haozhong Zhang
2016-02-15 11:07     ` Jan Beulich
2016-02-17  9:01       ` Haozhong Zhang
2016-02-17  9:08         ` Jan Beulich
2016-02-18  7:42           ` Haozhong Zhang
2016-02-19  2:14             ` Konrad Rzeszutek Wilk
2016-03-01  7:39               ` Haozhong Zhang
2016-03-01 18:33                 ` Ian Jackson
2016-03-01 18:49                   ` Konrad Rzeszutek Wilk
2016-03-02  7:14                     ` Haozhong Zhang
2016-03-02 13:03                       ` Jan Beulich
2016-03-04  2:20                         ` Haozhong Zhang
2016-03-08  9:15                           ` Haozhong Zhang
2016-03-08  9:27                             ` Jan Beulich
2016-03-09 12:22                               ` Haozhong Zhang
2016-03-09 16:17                                 ` Jan Beulich
2016-03-10  3:27                                   ` Haozhong Zhang
2016-03-17 11:05                                   ` Ian Jackson
2016-03-17 13:37                                     ` Haozhong Zhang
2016-03-17 13:56                                       ` Jan Beulich
2016-03-17 14:22                                         ` Haozhong Zhang
2016-03-17 14:12                                       ` Xu, Quan
2016-03-17 14:22                                         ` Zhang, Haozhong
2016-03-07 20:53                       ` Konrad Rzeszutek Wilk
2016-03-08  5:50                         ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28   ` Haozhong Zhang
2016-02-24 14:00     ` Ross Philipson
2016-02-24 16:42       ` Haozhong Zhang
2016-02-24 17:50         ` Ross Philipson
2016-02-24 14:24     ` Jan Beulich
2016-02-24 15:48       ` Haozhong Zhang
2016-02-24 16:54         ` Jan Beulich
2016-02-28 14:48           ` Haozhong Zhang
2016-02-29  9:01             ` Jan Beulich
2016-02-29  9:45               ` Haozhong Zhang
2016-02-29 10:12                 ` Jan Beulich
2016-02-29 11:52                   ` Haozhong Zhang
2016-02-29 12:04                     ` Jan Beulich
2016-02-29 12:22                       ` Haozhong Zhang
2016-03-01 13:51                         ` Ian Jackson
2016-03-01 15:04                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56EAB83D02000078000DDDDD@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.