From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zhang, Haozhong" Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen Date: Mon, 15 Feb 2016 17:04:55 +0800 Message-ID: <20160215090455.GC8938@hz-desktop.sh.intel.com> References: <20160201054414.GA25211@hz-desktop.sh.intel.com> <20160203070052.GA4248@hz-desktop.sh.intel.com> <20160203154744.GG20732@char.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20160203154744.GG20732@char.us.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Juergen Gross , Wei Liu , "Tian, Kevin" , Keir Fraser , Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich , "Nakajima, Jun" , Xiao Guangrong List-Id: xen-devel@lists.xenproject.org On 02/03/16 23:47, Konrad Rzeszutek Wilk wrote: > > > > > Open: It seems no system call/ioctl is provided by Linux kernel to > > > > > get the physical address from a virtual address. > > > > > /proc//pagemap provides information of mapping from > > > > > VA to PA. Is it an acceptable solution to let QEMU parse this > > > > > file to get the physical address? > > > > > > > > Does it work in a non-root scenario? > > > > > > > > > > Seemingly no, according to Documentation/vm/pagemap.txt in Linux kernel: > > > | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. > > > | In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from > > > | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. > > > | Reason: information about PFNs helps in exploiting Rowhammer vulnerability. > > Ah right. > > > > > > A possible alternative is to add a new hypercall similar to > > > XEN_DOMCTL_memory_mapping but receiving virtual address as the address > > > parameter and translating to machine address in the hypervisor. > > > > That might work. > > That won't work. > > This is a userspace VMA - which means the once the ioctl is done we swap > to kernel virtual addresses. Now we may know that the prior cr3 has the > userspace virtual address and walk it down - but what if the domain > that is doing this is PVH? (or HVM) - the cr3 of userspace is tucked somewhere > inside the kernel. > > Which means this hypercall would need to know the Linux kernel task structure > to find this. > > May I propose another solution - an stacking driver (similar to loop). You > setup it up (ioctl /dev/pmem0/guest.img, get some /dev/mapper/guest.img created). > Then mmap the /dev/mapper/guest.img - all of the operations are the same - except > it may have an extra ioctl - get_pfns - which would provide the data in similar > form to pagemap.txt. > This stack driver approach seems still need privileged permission and more modifications in kernel, so ... > But folks will then ask - why don't you just use pagemap? Could the pagemap > have an extra security capability check? One that can be set for > QEMU? > ... I would like to use pagemap and mlock. Haozhong > > > > > > > > > Open: For a large pmem, mmap(2) is very possible to not map all SPA > > > > > occupied by pmem at the beginning, i.e. QEMU may not be able to > > > > > get all SPA of pmem from buf (in virtual address space) when > > > > > calling XEN_DOMCTL_memory_mapping. > > > > > Can mmap flag MAP_LOCKED or mlock(2) be used to enforce the > > > > > entire pmem being mmaped? > > > > > > > > Ditto > > > > > > > > > > No. If I take the above alternative for the first open, maybe the new > > > hypercall above can inject page faults into dom0 for the unmapped > > > virtual address so as to enforce dom0 Linux to create the page > > > mapping. > > Ugh. That sounds hacky. And you wouldn't neccessarily be safe. > Imagine that the system admin decides to defrag the /dev/pmem filesystem. > Or move the files (disk images) around. If they do that - we may > still have the guest mapped to system addresses which may contain filesystem > metadata now, or a different guest image. We MUST mlock or lock the file > during the duration of the guest. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel