From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Date: Tue, 26 Jan 2016 11:44:29 +0000 Message-ID: References: <20160120110449.GD4939@hz-desktop.sh.intel.com> <569F7B8302000078000C8FF8@prv-mh.provo.novell.com> <569FA7F3.8080506@linux.intel.com> <569FCCED02000078000C94BA@prv-mh.provo.novell.com> <569FC112.9060309@linux.intel.com> <56A0A25002000078000C971B@prv-mh.provo.novell.com> <56A095E3.5060507@linux.intel.com> <56A0AA8A02000078000C977D@prv-mh.provo.novell.com> <56A0A09A.2050101@linux.intel.com> <56A0C02A02000078000C9823@prv-mh.provo.novell.com> <20160121140103.GB6362@hz-desktop.sh.intel.com> <56A0FEA102000078000C9A44@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <56A0FEA102000078000C9A44@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Haozhong Zhang , Kevin Tian , Wei Liu , Ian Campbell , Stefano Stabellini , Andrew Cooper , Ian Jackson , "xen-devel@lists.xen.org" , Jun Nakajima , Xiao Guangrong , Keir Fraser List-Id: xen-devel@lists.xenproject.org On Thu, Jan 21, 2016 at 2:52 PM, Jan Beulich wrote: >>>> On 21.01.16 at 15:01, wrote: >> On 01/21/16 03:25, Jan Beulich wrote: >>> >>> On 21.01.16 at 10:10, wrote: >>> > b) some _DSMs control PMEM so you should filter out these kind of _DSMs and >>> > handle them in hypervisor. >>> >>> Not if (see above) following the model we currently have in place. >>> >> >> You mean let dom0 linux evaluates those _DSMs and interact with >> hypervisor if necessary (e.g. XENPF_mem_hotadd for memory hotplug)? > > Yes. > >>> > c) hypervisor should mange PMEM resource pool and partition it to multiple >>> > VMs. >>> >>> Yes. >>> >> >> But I Still do not quite understand this part: why must pmem resource >> management and partition be done in hypervisor? > > Because that's where memory management belongs. And PMEM, > other than PBLK, is just another form of RAM. I haven't looked more deeply into the details of this, but this argument doesn't seem right to me. Normal RAM in Xen is what might be called "fungible" -- at boot, all RAM is zeroed, and it basically doesn't matter at all what RAM is given to what guest. (There are restrictions of course: lowmem for DMA, contiguous superpages, &c; but within those groups, it doesn't matter *which* bit of lowmem you get, as long as you get enough to do your job.) If you reboot your guest or hand RAM back to the hypervisor, you assume that everything in it will disappear. When you ask for RAM, you can request some parameters that it will have (lowmem, on a specific node, &c), but you can't request a specific page that you had before. This is not the case for PMEM. The whole point of PMEM (correct me if I'm wrong) is to be used for long-term storage that survives over reboot. It matters very much that a guest be given the same PRAM after the host is rebooted that it was given before. It doesn't make any sense to manage it the way Xen currently manages RAM (i.e., that you request a page and get whatever Xen happens to give you). So if Xen is going to use PMEM, it will have to invent an entirely new interface for guests, and it will have to keep track of those resources across host reboots. In other words, it will have to duplicate all the work that Linux already does. What do we gain from that duplication? Why not just leverage what's already implemented in dom0? >> I mean if we allow the following steps of operations (for example) >> (1) partition pmem in dom 0 >> (2) get address and size of each partition (part_addr, part_size) >> (3) call a hypercall like nvdimm_memory_mapping(d, part_addr, part_size, >> gpfn) to >> map a partition to the address gpfn in dom d. >> Only the last step requires hypervisor. Would anything be wrong if we >> allow above operations? > > The main issue is that this would imo be a layering violation. I'm > sure it can be made work, but that doesn't mean that's the way > it ought to work. Jan, from a toolstack <-> Xen perspective, I'm not sure what alternative there to the interface above. Won't the toolstack have to 1) figure out what nvdimm regions there are and 2) tell Xen how and where to assign them to the guest no matter what we do? And if we want to assign arbitrary regions to arbitrary guests, then (part_addr, part_size) and (gpfn) are going to be necessary bits of information. The only difference would be whether part_addr is the machine address or some abstracted address space (possibly starting at 0). What does your ideal toolstack <-> Xen interface look like? -George