All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Juergen Gross <JGross@suse.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Tue, 29 Mar 2016 04:49:41 -0600	[thread overview]
Message-ID: <56FA79E502000078000E0D36@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20160329101036.GC3650@hz-desktop.sh.intel.com>

>>> On 29.03.16 at 12:10, <haozhong.zhang@intel.com> wrote:
> On 03/29/16 03:11, Jan Beulich wrote:
>> >>> On 29.03.16 at 10:47, <haozhong.zhang@intel.com> wrote:
>> > On 03/17/16 22:21, Haozhong Zhang wrote:
>> >> On 03/17/16 14:00, Ian Jackson wrote:
>> >> > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM 
> support for Xen"):
>> >> > > QEMU keeps mappings of guest memory because (1) that mapping is
>> >> > > created by itself, and/or (2) certain device emulation needs to access
>> >> > > the guest memory. But for vNVDIMM, I'm going to move the creation of
>> >> > > its mappings out of qemu to toolstack and vNVDIMM in QEMU does not
>> >> > > access vNVDIMM pages mapped to guest, so it's not necessary to let
>> >> > > qemu keeps vNVDIMM mappings.
>> >> > 
>> >> > I'm confused by this.
>> >> > 
>> >> > Suppose a guest uses an emulated device (or backend) provided by qemu,
>> >> > to do DMA to an vNVDIMM.  Then qemu will need to map the real NVDIMM
>> >> > pages into its own address space, so that it can write to the memory
>> >> > (ie, do the virtual DMA).
>> >> > 
>> >> > That virtual DMA might well involve a direct mapping in the kernel
>> >> > underlying qemu: ie, qemu might use O_DIRECT to have its kernel write
>> >> > directly to the NVDIMM, and with luck the actual device backing the
>> >> > virtual device will be able to DMA to the NVDIMM.
>> >> > 
>> >> > All of this seems to me to mean that qemu needs to be able to map
>> >> > its guest's parts of NVDIMMs
>> >> > 
>> >> > There are probably other example: memory inspection systems used by
>> >> > virus scanners etc.; debuggers used to inspect a guest from outside;
>> >> > etc.
>> >> > 
>> >> > I haven't even got started on save/restore...
>> >> > 
>> >> 
>> >> Oops, so many cases I missed. Thanks Ian for pointing out all these!
>> >> Now I need to reconsider how to manage guest permissions for NVDIMM pages.
>> >> 
>> > 
>> > I still cannot find a neat approach to manage guest permissions for
>> > nvdimm pages. A possible one is to use a per-domain bitmap to track
>> > permissions: each bit corresponding to an nvdimm page. The bitmap can
>> > save lots of spaces and even be stored in the normal ram, but
>> > operating it for a large nvdimm range, especially for a contiguous
>> > one, is slower than rangeset.
>> 
>> I don't follow: What would a single bit in that bitmap mean? Any
>> guest may access the page? That surely wouldn't be what we
>> need.
>>
> 
> For a host having a N pages of nvdimm, each domain will have a N bits
> bitmap. If the m'th bit of a domain's bitmap is set, then that domain
> has the permission to access the m'th host nvdimm page.

Which will be more overhead as soon as there are enough such
domains in a system.

>> > BTW, if I take the other way to map nvdimm pages to guest
>> > (http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01972.html)
>> > | 2. Or, given the same inputs, we may combine above two steps into a new
>> > |    dom0 system call that (1) gets the SPA ranges, (2) calls xen
>> > |    hypercall to map SPA ranges
>> > and treat nvdimm as normal ram, then xen will not need to use rangeset
>> > or above bitmap to track guest permissions for nvdimm? But looking at
>> > how qemu currently populates guest memory via XENMEM_populate_physmap
>> > , and other hypercalls like XENMEM_[in|de]crease_reservation, it looks
>> > like that mapping a _dedicated_ piece of host ram to guest is not
>> > allowed out of the hypervisor (and not allowed even in dom0 kernel)?
>> > Is it for security concerns, e.g. avoiding a malfunctioned dom0 leaking
>> > guest memory?
>> 
>> Well, it's simply because RAM is a resource managed through
>> allocation/freeing, instead of via reserving chunks for special
>> purposes.
>> 
> 
> So that means xen can always ensure the ram assigned to a guest is
> what the guest is permitted to access, so no data structures like
> iomem_caps is needed for ram. If I have to introduce a hypercall that
> maps the dedicated host ram/nvdimm to guest, then the explicit
> permission management is still needed, regardless of who (dom0 kernel,
> qemu or toolstack) will use it. Right?

Yes (if you really mean to go that route).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-03-29 10:49 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02  3:27   ` Tian, Kevin
2016-02-02  3:44   ` Haozhong Zhang
2016-02-02 11:09     ` Andrew Cooper
2016-02-02  6:33 ` Tian, Kevin
2016-02-02  7:39   ` Zhang, Haozhong
2016-02-02  7:48     ` Tian, Kevin
2016-02-02  7:53       ` Zhang, Haozhong
2016-02-02  8:03         ` Tian, Kevin
2016-02-02  8:49           ` Zhang, Haozhong
2016-02-02 19:01   ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03  7:00   ` Haozhong Zhang
2016-02-03  9:13     ` Jan Beulich
2016-02-03 14:09       ` Andrew Cooper
2016-02-03 14:23         ` Haozhong Zhang
2016-02-05 14:40         ` Ross Philipson
2016-02-06  1:43           ` Haozhong Zhang
2016-02-06 16:17             ` Ross Philipson
2016-02-03 12:02     ` Stefano Stabellini
2016-02-03 13:11       ` Haozhong Zhang
2016-02-03 14:20         ` Andrew Cooper
2016-02-04  3:10           ` Haozhong Zhang
2016-02-03 15:16       ` George Dunlap
2016-02-03 15:22         ` Stefano Stabellini
2016-02-03 15:35           ` Konrad Rzeszutek Wilk
2016-02-03 15:35           ` George Dunlap
2016-02-04  2:55           ` Haozhong Zhang
2016-02-04 12:24             ` Stefano Stabellini
2016-02-15  3:16               ` Zhang, Haozhong
2016-02-16 11:14                 ` Stefano Stabellini
2016-02-16 12:55                   ` Jan Beulich
2016-02-17  9:03                     ` Haozhong Zhang
2016-03-04  7:30                     ` Haozhong Zhang
2016-03-16 12:55                       ` Haozhong Zhang
2016-03-16 13:13                         ` Konrad Rzeszutek Wilk
2016-03-16 13:16                         ` Jan Beulich
2016-03-16 13:55                           ` Haozhong Zhang
2016-03-16 14:23                             ` Jan Beulich
2016-03-16 14:55                               ` Haozhong Zhang
2016-03-16 15:23                                 ` Jan Beulich
2016-03-17  8:58                                   ` Haozhong Zhang
2016-03-17 11:04                                     ` Jan Beulich
2016-03-17 12:44                                       ` Haozhong Zhang
2016-03-17 12:59                                         ` Jan Beulich
2016-03-17 13:29                                           ` Haozhong Zhang
2016-03-17 13:52                                             ` Jan Beulich
2016-03-17 14:00                                             ` Ian Jackson
2016-03-17 14:21                                               ` Haozhong Zhang
2016-03-29  8:47                                                 ` Haozhong Zhang
2016-03-29  9:11                                                   ` Jan Beulich
2016-03-29 10:10                                                     ` Haozhong Zhang
2016-03-29 10:49                                                       ` Jan Beulich [this message]
2016-04-08  5:02                                                         ` Haozhong Zhang
2016-04-08 15:52                                                           ` Jan Beulich
2016-04-12  8:45                                                             ` Haozhong Zhang
2016-04-21  5:09                                                               ` Haozhong Zhang
2016-04-21  7:04                                                                 ` Jan Beulich
2016-04-22  2:36                                                                   ` Haozhong Zhang
2016-04-22  8:24                                                                     ` Jan Beulich
2016-04-22 10:16                                                                       ` Haozhong Zhang
2016-04-22 10:53                                                                         ` Jan Beulich
2016-04-22 12:26                                                                           ` Haozhong Zhang
2016-04-22 12:36                                                                             ` Jan Beulich
2016-04-22 12:54                                                                               ` Haozhong Zhang
2016-04-22 13:22                                                                                 ` Jan Beulich
2016-03-17 13:32                                         ` Konrad Rzeszutek Wilk
2016-02-03 15:47       ` Konrad Rzeszutek Wilk
2016-02-04  2:36         ` Haozhong Zhang
2016-02-15  9:04         ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03  8:28   ` Haozhong Zhang
2016-02-03  9:18     ` Jan Beulich
2016-02-03 12:22       ` Haozhong Zhang
2016-02-03 12:38         ` Jan Beulich
2016-02-03 12:49           ` Haozhong Zhang
2016-02-03 14:30       ` Andrew Cooper
2016-02-03 14:39         ` Jan Beulich
2016-02-15  8:43   ` Haozhong Zhang
2016-02-15 11:07     ` Jan Beulich
2016-02-17  9:01       ` Haozhong Zhang
2016-02-17  9:08         ` Jan Beulich
2016-02-18  7:42           ` Haozhong Zhang
2016-02-19  2:14             ` Konrad Rzeszutek Wilk
2016-03-01  7:39               ` Haozhong Zhang
2016-03-01 18:33                 ` Ian Jackson
2016-03-01 18:49                   ` Konrad Rzeszutek Wilk
2016-03-02  7:14                     ` Haozhong Zhang
2016-03-02 13:03                       ` Jan Beulich
2016-03-04  2:20                         ` Haozhong Zhang
2016-03-08  9:15                           ` Haozhong Zhang
2016-03-08  9:27                             ` Jan Beulich
2016-03-09 12:22                               ` Haozhong Zhang
2016-03-09 16:17                                 ` Jan Beulich
2016-03-10  3:27                                   ` Haozhong Zhang
2016-03-17 11:05                                   ` Ian Jackson
2016-03-17 13:37                                     ` Haozhong Zhang
2016-03-17 13:56                                       ` Jan Beulich
2016-03-17 14:22                                         ` Haozhong Zhang
2016-03-17 14:12                                       ` Xu, Quan
2016-03-17 14:22                                         ` Zhang, Haozhong
2016-03-07 20:53                       ` Konrad Rzeszutek Wilk
2016-03-08  5:50                         ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28   ` Haozhong Zhang
2016-02-24 14:00     ` Ross Philipson
2016-02-24 16:42       ` Haozhong Zhang
2016-02-24 17:50         ` Ross Philipson
2016-02-24 14:24     ` Jan Beulich
2016-02-24 15:48       ` Haozhong Zhang
2016-02-24 16:54         ` Jan Beulich
2016-02-28 14:48           ` Haozhong Zhang
2016-02-29  9:01             ` Jan Beulich
2016-02-29  9:45               ` Haozhong Zhang
2016-02-29 10:12                 ` Jan Beulich
2016-02-29 11:52                   ` Haozhong Zhang
2016-02-29 12:04                     ` Jan Beulich
2016-02-29 12:22                       ` Haozhong Zhang
2016-03-01 13:51                         ` Ian Jackson
2016-03-01 15:04                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FA79E502000078000E0D36@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.