From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra
 ACPI tables from qemu
Date: Wed, 20 Jan 2016 15:05:08 +0000
Message-ID: <alpine.DEB.2.02.1601201450190.9400@kaball.uk.xensource.com>
References: <1451388711-18646-1-git-send-email-haozhong.zhang@intel.com>
	<1451388711-18646-5-git-send-email-haozhong.zhang@intel.com>
	<5699362402000078000C7803@prv-mh.provo.novell.com>
	<20160118005255.GC3528@hz-desktop.sh.intel.com>
	<569CB47502000078000C7CFB@prv-mh.provo.novell.com>
	<20160120053132.GA5005@hz-desktop.sh.intel.com>
	<569F575902000078000C8EDC@prv-mh.provo.novell.com>
	<569F4C2C.7040300@citrix.com>
	<20160120101526.GC4939@hz-desktop.sh.intel.com>
	<569F6336.2040104@linux.intel.com> <569F88AC.40100@citrix.com>
	<alpine.DEB.2.02.1601201425170.9400@kaball.uk.xensource.com>
	<569F9DA5.7010903@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <569F9DA5.7010903@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Jun Nakajima <jun.nakajima@intel.com>, Ian Jackson <ian.jackson@eu.citrix.com>, xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>, Xiao Guangrong <guangrong.xiao@linux.intel.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

On Wed, 20 Jan 2016, Andrew Cooper wrote:
> On 20/01/16 14:29, Stefano Stabellini wrote:
> > On Wed, 20 Jan 2016, Andrew Cooper wrote:
> >> On 20/01/16 10:36, Xiao Guangrong wrote:
> >>> Hi,
> >>>
> >>> On 01/20/2016 06:15 PM, Haozhong Zhang wrote:
> >>>
> >>>> CCing QEMU vNVDIMM maintainer: Xiao Guangrong
> >>>>
> >>>>> Conceptually, an NVDIMM is just like a fast SSD which is linearly
> >>>>> mapped
> >>>>> into memory.  I am still on the dom0 side of this fence.
> >>>>>
> >>>>> The real question is whether it is possible to take an NVDIMM, split it
> >>>>> in half, give each half to two different guests (with appropriate NFIT
> >>>>> tables) and that be sufficient for the guests to just work.
> >>>>>
> >>>> Yes, one NVDIMM device can be split into multiple parts and assigned
> >>>> to different guests, and QEMU is responsible to maintain virtual NFIT
> >>>> tables for each part.
> >>>>
> >>>>> Either way, it needs to be a toolstack policy decision as to how to
> >>>>> split the resource.
> >>> Currently, we are using NVDIMM as a block device and a DAX-based
> >>> filesystem
> >>> is created upon it in Linux so that file-related accesses directly reach
> >>> the NVDIMM device.
> >>>
> >>> In KVM, If the NVDIMM device need to be shared by different VMs, we can
> >>> create multiple files on the DAX-based filesystem and assign the file to
> >>> each VMs. In the future, we can enable namespace (partition-like) for
> >>> PMEM
> >>> memory and assign the namespace to each VMs (current Linux driver uses
> >>> the
> >>> whole PMEM as a single namespace).
> >>>
> >>> I think it is not a easy work to let Xen hypervisor recognize NVDIMM
> >>> device
> >>> and manager NVDIMM resource.
> >>>
> >>> Thanks!
> >>>
> >> The more I see about this, the more sure I am that we want to keep it as
> >> a block device managed by dom0.
> >>
> >> In the case of the DAX-based filesystem, I presume files are not
> >> necessarily contiguous.  I also presume that this is worked around by
> >> permuting the mapping of the virtual NVDIMM such that the it appears as
> >> a contiguous block of addresses to the guest?
> >>
> >> Today in Xen, Qemu already has the ability to create mappings in the
> >> guest's address space, e.g. to map PCI device BARs.  I don't see a
> >> conceptual difference here, although the security/permission model
> >> certainly is more complicated.
> > I imagine that mmap'ing  these /dev/pmemXX devices require root
> > privileges, does it not?
> 
> I presume it does, although mmap()ing a file on a DAX filesystem will
> work in the standard POSIX way.
> 
> Neither of these are sufficient however.  That gets Qemu a mapping of
> the NVDIMM, not the guest.  Something, one way or another, has to turn
> this into appropriate add-to-phymap hypercalls.
> 
> >
> > I wouldn't encourage the introduction of anything else that requires
> > root privileges in QEMU. With QEMU running as non-root by default in
> > 4.7, the feature will not be available unless users explicitly ask to
> > run QEMU as root (which they shouldn't really).
> 
> This isn't how design works.
> 
> First, design a feature in an architecturally correct way, and then
> design an security policy to fit.
>
> We should not stunt design based on an existing implementation.  In
> particular, if design shows that being a root only feature is the only
> sane way of doing this, it should be a root only feature.  (I hope this
> is not the case, but it shouldn't cloud the judgement of a design).

I would argue that security is an integral part of the architecture and
should not be retrofitted into it.

Is it really a good design if the only sane way to implement it is
making it a root-only feature? I think not. Designing security policies
for pieces of software that don't have the infrastructure for them is
costly and that cost should be accounted as part of the overall cost of
the solution rather than added to it in a second stage.


> (note, both before implement happens).

That is ideal but realistically in many cases nobody is able to produce
a design before the implementation happens. There is plenty of articles
written about this since the 90s / early 00s.