From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Date: Fri, 7 Aug 2015 13:44:41 +0100 Message-ID: <55C4A839.8090307@citrix.com> References: <1438739842-31658-1-git-send-email-shuai.ruan@linux.intel.com> <1438739842-31658-2-git-send-email-shuai.ruan@linux.intel.com> <55C24D21.50209@citrix.com> <20150807080008.GA2976@shuai.ruan@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150807080008.GA2976@shuai.ruan@linux.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Shuai Ruan Cc: kevin.tian@intel.com, wei.liu2@citrix.com, Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com, jun.nakajima@intel.com, ian.jackson@eu.citrix.com, eddie.dong@intel.com, xen-devel@lists.xen.org, jbeulich@suse.com, keir@xen.org List-Id: xen-devel@lists.xenproject.org On 07/08/15 09:00, Shuai Ruan wrote: > >>> + goto skip; >>> + } >>> + >>> + if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) ) >> What does edi have to do with xsaves? only edx:eax are special >> according to the manual. >> > regs->edi is the guest_linear_address Whyso? xsaves takes an unconditional memory parameter, not a pointer in %rdi. (regs->edi is only correct for ins/outs because the pointer is architecturally required to be in %rdi.) There is nothing currently in emulate_privileged_op() which does ModRM decoding for memory references, nor SIB decoding. xsaves/xrstors would be the first such operations. I am also not sure that adding arbitrary memory decode here is sensible. In an ideal world, we would have what is currently x86_emulate() split in 3 stages. Stage 1 does straight instruction decode to some internal representation. Stage 2 does an audit to see whether the decoded instruction is plausible for the reason why an emulation was needed. We have had a number of security issues with emulation in the past where guests cause one instruction to trap for emulation, then rewrite the instruction to be something else, and exploit a bug in the emulator. Stage 3 performs the actions required for emulation. Currently, x86_emulate() is limited to instructions which might legitimately fault for emulation, but with the advent of VM introspection, this is proving to be insufficient. With my x86 maintainers hat on, I would like to avoid the current situation we have with multiple bits of code doing x86 instruction decode and emulation (which are all different). I think the 3-step approach above caters suitably to all usecases, but it is a large project itself. It allows the introspection people to have a full and complete x86 emulation infrastructure, while also preventing areas like the shadow paging from being opened up to potential vulnerabilities in unrelated areas of the x86 architecture. I would even go so far as to say that it is probably ok not to support xsaves/xrestors in PV guests until something along the above lines is sorted. The first feature in XSS is processor trace which a PV guest couldn't use anyway. I suspect the same applies to most of the other XSS features, or they wouldn't need to be privileged in the first place. > >>> + >>> + if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] & >>> + X86_CR4_OSXSAVE)) >>> + { >>> + do_guest_trap(TRAP_invalid_op, regs, 0); >>> + goto skip; >>> + } >>> + >>> + if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS ) >>> + { >>> + do_guest_trap(TRAP_nmi, regs, 0); >>> + goto skip; >>> + } >>> + >>> + if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) ) >>> + goto fail; >>> + >>> + if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi, >>> + sizeof(struct xsave_struct))) !=0 ) >>> + { >>> + propagate_page_fault(regs->edi + >>> + sizeof(struct xsave_struct) - rc, 0); >>> + goto skip; >> Surely you just need the xstate_bv and xcomp_bv ? >> > I will dig into SDM to see whether I missing some checkings. What I mean by this is that xstate_bv and xcomp_bv are all that you are checking, so you just need two uint64_t's, rather than a full xsave_struct. > >>> >>> default: >>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c >>> index 98310f3..de94ac1 100644 >>> --- a/xen/arch/x86/x86_64/mm.c >>> +++ b/xen/arch/x86/x86_64/mm.c >>> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES]; >>> >>> l2_pgentry_t *compat_idle_pg_table_l2; >>> >>> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr) >> What is this function? Why is it useful? Something like this belongs >> in its own patch along with a description of why it is being introduced. >> > The fucntion is used for getting the mfn related to guest linear address. > Is there an another existing function I can use that can do the same > thing? Can you give me a suggestion. do_page_walk() and use virt_to_mfn() on the result? (I am just guessing, but > >>> +{ >>> + asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f" >>> + : "=m" (*ptr) >>> + : "a" (lmask), "d" (hmask), "D" (ptr) ); >>> +} >>> + >>> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr) >>> +{ >>> + asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n" >>> + ".section .fixup,\"ax\" \n" >>> + "2: mov %5,%%ecx \n" >>> + " xor %1,%1 \n" >>> + " rep stosb \n" >>> + " lea %2,%0 \n" >>> + " mov %3,%1 \n" >>> + " jmp 1b \n" >>> + ".previous \n" >>> + _ASM_EXTABLE(1b, 2b) >>> + : "+&D" (ptr), "+&a" (lmask) >>> + : "m" (*ptr), "g" (lmask), "d" (hmask), >>> + "m" (xsave_cntxt_size) >>> + : "ecx" ); >>> +} >>> + >> Neither of these two helpers have anything like sufficient checking to >> be safely used on guest state. >> > Basic checking is done before these two helpers. But this isn't the only place where they are used. ~Andrew