All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Juergen Gross <JGross@suse.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Thu, 21 Apr 2016 01:04:54 -0600	[thread overview]
Message-ID: <571897B602000078000E43CB@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20160421050933.GA7471@hz-desktop.sh.intel.com>

>>> On 21.04.16 at 07:09, <haozhong.zhang@intel.com> wrote:
> On 04/12/16 16:45, Haozhong Zhang wrote:
>> On 04/08/16 09:52, Jan Beulich wrote:
>> > >>> On 08.04.16 at 07:02, <haozhong.zhang@intel.com> wrote:
>> > > On 03/29/16 04:49, Jan Beulich wrote:
>> > >> >>> On 29.03.16 at 12:10, <haozhong.zhang@intel.com> wrote:
>> > >> > On 03/29/16 03:11, Jan Beulich wrote:
>> > >> >> >>> On 29.03.16 at 10:47, <haozhong.zhang@intel.com> wrote:
>> > > [..]
>> > >> >> > I still cannot find a neat approach to manage guest permissions for
>> > >> >> > nvdimm pages. A possible one is to use a per-domain bitmap to track
>> > >> >> > permissions: each bit corresponding to an nvdimm page. The bitmap can
>> > >> >> > save lots of spaces and even be stored in the normal ram, but
>> > >> >> > operating it for a large nvdimm range, especially for a contiguous
>> > >> >> > one, is slower than rangeset.
>> > >> >> 
>> > >> >> I don't follow: What would a single bit in that bitmap mean? Any
>> > >> >> guest may access the page? That surely wouldn't be what we
>> > >> >> need.
>> > >> >>
>> > >> > 
>> > >> > For a host having a N pages of nvdimm, each domain will have a N bits
>> > >> > bitmap. If the m'th bit of a domain's bitmap is set, then that domain
>> > >> > has the permission to access the m'th host nvdimm page.
>> > >> 
>> > >> Which will be more overhead as soon as there are enough such
>> > >> domains in a system.
>> > >>
>> > > 
>> > > Sorry for the late reply.
>> > > 
>> > > I think we can make some optimization to reduce the space consumed by
>> > > the bitmap.
>> > > 
>> > > A per-domain bitmap covering the entire host NVDIMM address range is
>> > > wasteful especially if the actual used ranges are congregated. We may
>> > > take following ways to reduce its space.
>> > > 
>> > > 1) Split the per-domain bitmap into multiple sub-bitmap and each
>> > >    sub-bitmap covers a smaller and contiguous sub host NVDIMM address
>> > >    range. In the beginning, no sub-bitmap is allocated for the
>> > >    domain. If the access permission to a host NVDIMM page in a sub
>> > >    host address range is added to a domain, only the sub-bitmap for
>> > >    that address range is allocated for the domain. If access
>> > >    permissions to all host NVDIMM pages in a sub range are removed
>> > >    from a domain, the corresponding sub-bitmap can be freed.
>> > > 
>> > > 2) If a domain has access permissions to all host NVDIMM pages in a
>> > >    sub range, the corresponding sub-bitmap will be replaced by a range
>> > >    struct. If range structs are used to track adjacent ranges, they
>> > >    will be merged into one range struct. If access permissions to some
>> > >    pages in that sub range are removed from a domain, the range struct
>> > >    should be converted back to bitmap segment(s).
>> > > 
>> > > 3) Because there might be lots of above bitmap segments and range
>> > >    structs per-domain, we can organize them in a balanced interval
>> > >    tree to quickly search/add/remove an individual structure.
>> > > 
>> > > In the worst case that each sub range has non-contiguous pages
>> > > assigned to a domain, above solution will use all sub-bitmaps and
>> > > consume more space than a single bitmap because of the extra space for
>> > > organization. I assume that the sysadmin should be responsible to
>> > > ensure the host nvdimm ranges assigned to each domain as contiguous
>> > > and congregated as possible in order to avoid the worst case. However,
>> > > if the worst case does happen, xen hypervisor should refuse to assign
>> > > nvdimm to guest when it runs out of memory.
>> > 
>> > To be honest, this all sounds pretty unconvincing wrt not using
>> > existing code paths - a lot of special treatment, and hence a lot
>> > of things that can go (slightly) wrong.
>> > 
>> 
>> Well, using existing range struct to manage guest access permissions
>> to nvdimm could consume too much space which could not fit in either
>> memory or nvdimm. If the above solution looks really error-prone,
>> perhaps we can still come back to the existing one and restrict the
>> number of range structs each domain could have for nvdimm
>> (e.g. reserve one 4K-page per-domain for them) to make it work for
>> nvdimm, though it may reject nvdimm mapping that is terribly
>> fragmented.
> 
> Hi Jan,
> 
> Any comments for this?

Well, nothing new, i.e. my previous opinion on the old proposal didn't
change. I'm really opposed to any artificial limitations here, as I am to
any secondary (and hence error prone) code paths. IOW I continue
to think that there's no reasonable alternative to re-using the existing
memory management infrastructure for at least the PMEM case. The
only open question remains to be where to place the control structures,
and I think the thresholding proposal of yours was quite sensible.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-04-21  7:04 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02  3:27   ` Tian, Kevin
2016-02-02  3:44   ` Haozhong Zhang
2016-02-02 11:09     ` Andrew Cooper
2016-02-02  6:33 ` Tian, Kevin
2016-02-02  7:39   ` Zhang, Haozhong
2016-02-02  7:48     ` Tian, Kevin
2016-02-02  7:53       ` Zhang, Haozhong
2016-02-02  8:03         ` Tian, Kevin
2016-02-02  8:49           ` Zhang, Haozhong
2016-02-02 19:01   ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03  7:00   ` Haozhong Zhang
2016-02-03  9:13     ` Jan Beulich
2016-02-03 14:09       ` Andrew Cooper
2016-02-03 14:23         ` Haozhong Zhang
2016-02-05 14:40         ` Ross Philipson
2016-02-06  1:43           ` Haozhong Zhang
2016-02-06 16:17             ` Ross Philipson
2016-02-03 12:02     ` Stefano Stabellini
2016-02-03 13:11       ` Haozhong Zhang
2016-02-03 14:20         ` Andrew Cooper
2016-02-04  3:10           ` Haozhong Zhang
2016-02-03 15:16       ` George Dunlap
2016-02-03 15:22         ` Stefano Stabellini
2016-02-03 15:35           ` Konrad Rzeszutek Wilk
2016-02-03 15:35           ` George Dunlap
2016-02-04  2:55           ` Haozhong Zhang
2016-02-04 12:24             ` Stefano Stabellini
2016-02-15  3:16               ` Zhang, Haozhong
2016-02-16 11:14                 ` Stefano Stabellini
2016-02-16 12:55                   ` Jan Beulich
2016-02-17  9:03                     ` Haozhong Zhang
2016-03-04  7:30                     ` Haozhong Zhang
2016-03-16 12:55                       ` Haozhong Zhang
2016-03-16 13:13                         ` Konrad Rzeszutek Wilk
2016-03-16 13:16                         ` Jan Beulich
2016-03-16 13:55                           ` Haozhong Zhang
2016-03-16 14:23                             ` Jan Beulich
2016-03-16 14:55                               ` Haozhong Zhang
2016-03-16 15:23                                 ` Jan Beulich
2016-03-17  8:58                                   ` Haozhong Zhang
2016-03-17 11:04                                     ` Jan Beulich
2016-03-17 12:44                                       ` Haozhong Zhang
2016-03-17 12:59                                         ` Jan Beulich
2016-03-17 13:29                                           ` Haozhong Zhang
2016-03-17 13:52                                             ` Jan Beulich
2016-03-17 14:00                                             ` Ian Jackson
2016-03-17 14:21                                               ` Haozhong Zhang
2016-03-29  8:47                                                 ` Haozhong Zhang
2016-03-29  9:11                                                   ` Jan Beulich
2016-03-29 10:10                                                     ` Haozhong Zhang
2016-03-29 10:49                                                       ` Jan Beulich
2016-04-08  5:02                                                         ` Haozhong Zhang
2016-04-08 15:52                                                           ` Jan Beulich
2016-04-12  8:45                                                             ` Haozhong Zhang
2016-04-21  5:09                                                               ` Haozhong Zhang
2016-04-21  7:04                                                                 ` Jan Beulich [this message]
2016-04-22  2:36                                                                   ` Haozhong Zhang
2016-04-22  8:24                                                                     ` Jan Beulich
2016-04-22 10:16                                                                       ` Haozhong Zhang
2016-04-22 10:53                                                                         ` Jan Beulich
2016-04-22 12:26                                                                           ` Haozhong Zhang
2016-04-22 12:36                                                                             ` Jan Beulich
2016-04-22 12:54                                                                               ` Haozhong Zhang
2016-04-22 13:22                                                                                 ` Jan Beulich
2016-03-17 13:32                                         ` Konrad Rzeszutek Wilk
2016-02-03 15:47       ` Konrad Rzeszutek Wilk
2016-02-04  2:36         ` Haozhong Zhang
2016-02-15  9:04         ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03  8:28   ` Haozhong Zhang
2016-02-03  9:18     ` Jan Beulich
2016-02-03 12:22       ` Haozhong Zhang
2016-02-03 12:38         ` Jan Beulich
2016-02-03 12:49           ` Haozhong Zhang
2016-02-03 14:30       ` Andrew Cooper
2016-02-03 14:39         ` Jan Beulich
2016-02-15  8:43   ` Haozhong Zhang
2016-02-15 11:07     ` Jan Beulich
2016-02-17  9:01       ` Haozhong Zhang
2016-02-17  9:08         ` Jan Beulich
2016-02-18  7:42           ` Haozhong Zhang
2016-02-19  2:14             ` Konrad Rzeszutek Wilk
2016-03-01  7:39               ` Haozhong Zhang
2016-03-01 18:33                 ` Ian Jackson
2016-03-01 18:49                   ` Konrad Rzeszutek Wilk
2016-03-02  7:14                     ` Haozhong Zhang
2016-03-02 13:03                       ` Jan Beulich
2016-03-04  2:20                         ` Haozhong Zhang
2016-03-08  9:15                           ` Haozhong Zhang
2016-03-08  9:27                             ` Jan Beulich
2016-03-09 12:22                               ` Haozhong Zhang
2016-03-09 16:17                                 ` Jan Beulich
2016-03-10  3:27                                   ` Haozhong Zhang
2016-03-17 11:05                                   ` Ian Jackson
2016-03-17 13:37                                     ` Haozhong Zhang
2016-03-17 13:56                                       ` Jan Beulich
2016-03-17 14:22                                         ` Haozhong Zhang
2016-03-17 14:12                                       ` Xu, Quan
2016-03-17 14:22                                         ` Zhang, Haozhong
2016-03-07 20:53                       ` Konrad Rzeszutek Wilk
2016-03-08  5:50                         ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28   ` Haozhong Zhang
2016-02-24 14:00     ` Ross Philipson
2016-02-24 16:42       ` Haozhong Zhang
2016-02-24 17:50         ` Ross Philipson
2016-02-24 14:24     ` Jan Beulich
2016-02-24 15:48       ` Haozhong Zhang
2016-02-24 16:54         ` Jan Beulich
2016-02-28 14:48           ` Haozhong Zhang
2016-02-29  9:01             ` Jan Beulich
2016-02-29  9:45               ` Haozhong Zhang
2016-02-29 10:12                 ` Jan Beulich
2016-02-29 11:52                   ` Haozhong Zhang
2016-02-29 12:04                     ` Jan Beulich
2016-02-29 12:22                       ` Haozhong Zhang
2016-03-01 13:51                         ` Ian Jackson
2016-03-01 15:04                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571897B602000078000E43CB@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.