From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <George.Dunlap@eu.citrix.com>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra
 ACPI tables from qemu
Date: Tue, 26 Jan 2016 11:44:29 +0000
Message-ID: <CAFLBxZbGYWXSKE0NRcQsD8Nngy_Xx0WdWMmXJDLB0iZWsyD-4g@mail.gmail.com>
References: <20160120110449.GD4939@hz-desktop.sh.intel.com>
	<569F7B8302000078000C8FF8@prv-mh.provo.novell.com>
	<569FA7F3.8080506@linux.intel.com>
	<569FCCED02000078000C94BA@prv-mh.provo.novell.com>
	<569FC112.9060309@linux.intel.com>
	<56A0A25002000078000C971B@prv-mh.provo.novell.com>
	<56A095E3.5060507@linux.intel.com>
	<56A0AA8A02000078000C977D@prv-mh.provo.novell.com>
	<56A0A09A.2050101@linux.intel.com>
	<56A0C02A02000078000C9823@prv-mh.provo.novell.com>
	<20160121140103.GB6362@hz-desktop.sh.intel.com>
	<56A0FEA102000078000C9A44@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <56A0FEA102000078000C9A44@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>, Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Jun Nakajima <jun.nakajima@intel.com>, Xiao Guangrong <guangrong.xiao@linux.intel.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

On Thu, Jan 21, 2016 at 2:52 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 21.01.16 at 15:01, <haozhong.zhang@intel.com> wrote:
>> On 01/21/16 03:25, Jan Beulich wrote:
>>> >>> On 21.01.16 at 10:10, <guangrong.xiao@linux.intel.com> wrote:
>>> > b) some _DSMs control PMEM so you should filter out these kind of _DSMs and
>>> >     handle them in hypervisor.
>>>
>>> Not if (see above) following the model we currently have in place.
>>>
>>
>> You mean let dom0 linux evaluates those _DSMs and interact with
>> hypervisor if necessary (e.g. XENPF_mem_hotadd for memory hotplug)?
>
> Yes.
>
>>> > c) hypervisor should mange PMEM resource pool and partition it to multiple
>>> >     VMs.
>>>
>>> Yes.
>>>
>>
>> But I Still do not quite understand this part: why must pmem resource
>> management and partition be done in hypervisor?
>
> Because that's where memory management belongs. And PMEM,
> other than PBLK, is just another form of RAM.

I haven't looked more deeply into the details of this, but this
argument doesn't seem right to me.

Normal RAM in Xen is what might be called "fungible" -- at boot, all
RAM is zeroed, and it basically doesn't matter at all what RAM is
given to what guest.  (There are restrictions of course: lowmem for
DMA, contiguous superpages, &c; but within those groups, it doesn't
matter *which* bit of lowmem you get, as long as you get enough to do
your job.)  If you reboot your guest or hand RAM back to the
hypervisor, you assume that everything in it will disappear.  When you
ask for RAM, you can request some parameters that it will have
(lowmem, on a specific node, &c), but you can't request a specific
page that you had before.

This is not the case for PMEM.  The whole point of PMEM (correct me if
I'm wrong) is to be used for long-term storage that survives over
reboot.  It matters very much that a guest be given the same PRAM
after the host is rebooted that it was given before.  It doesn't make
any sense to manage it the way Xen currently manages RAM (i.e., that
you request a page and get whatever Xen happens to give you).

So if Xen is going to use PMEM, it will have to invent an entirely new
interface for guests, and it will have to keep track of those
resources across host reboots.  In other words, it will have to
duplicate all the work that Linux already does.  What do we gain from
that duplication?  Why not just leverage what's already implemented in
dom0?

>> I mean if we allow the following steps of operations (for example)
>> (1) partition pmem in dom 0
>> (2) get address and size of each partition (part_addr, part_size)
>> (3) call a hypercall like nvdimm_memory_mapping(d, part_addr, part_size,
>> gpfn) to
>>     map a partition to the address gpfn in dom d.
>> Only the last step requires hypervisor. Would anything be wrong if we
>> allow above operations?
>
> The main issue is that this would imo be a layering violation. I'm
> sure it can be made work, but that doesn't mean that's the way
> it ought to work.

Jan, from a toolstack <-> Xen perspective, I'm not sure what
alternative there to the interface above.  Won't the toolstack have to
1) figure out what nvdimm regions there are and 2) tell Xen how and
where to assign them to the guest no matter what we do?  And if we
want to assign arbitrary regions to arbitrary guests, then (part_addr,
part_size) and (gpfn) are going to be necessary bits of information.
The only difference would be whether part_addr is the machine address
or some abstracted address space (possibly starting at 0).

What does your ideal toolstack <-> Xen interface look like?

 -George