From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx2.suse.de", Issuer "CAcert Class 3 Root" (not verified)) by ozlabs.org (Postfix) with ESMTPS id C4F1EB6F9F for ; Mon, 20 Feb 2012 22:40:33 +1100 (EST) Subject: Re: [PATCH 13/30] KVM: PPC: booke: category E.HV (GS-mode) support Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: Alexander Graf In-Reply-To: <4F3EC2A1.7080702@freescale.com> Date: Mon, 20 Feb 2012 12:40:27 +0100 Message-Id: <37B8BD82-D665-4A1F-B851-13C846BFD799@suse.de> References: <1329498837-11717-1-git-send-email-agraf@suse.de> <1329498837-11717-14-git-send-email-agraf@suse.de> <4F3EC2A1.7080702@freescale.com> To: Scott Wood Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 17.02.2012, at 22:12, Scott Wood wrote: > On 02/17/2012 11:13 AM, Alexander Graf wrote: >> From: Scott Wood >>=20 >> Chips such as e500mc that implement category E.HV in Power ISA 2.06 >> provide hardware virtualization features, including a new MSR mode = for >> guest state. The guest OS can perform many operations without = trapping >> into the hypervisor, including transitions to and from guest = userspace. >>=20 >> Since we can use SRR1[GS] to reliably tell whether an exception came = from >> guest state, instead of messing around with IVPR, we use DO_KVM = similarly >> to book3s. >>=20 >> Current issues include: >> - Machine checks from guest state are not routed to the host handler. >> - The guest can cause a host oops by executing an emulated = instruction >> in a page that lacks read permission. Existing e500/4xx support = has >> the same problem. >>=20 >> Includes work by Ashish Kalra , >> Varun Sethi , and >> Liu Yu . >>=20 >> Signed-off-by: Scott Wood >> [agraf: remove pt_regs usage] >> Signed-off-by: Alexander Graf >> --- >=20 > Thanks for picking this up! >=20 >> +static unsigned long get_guest_esr(struct kvm_vcpu *vcpu) >> +{ >> +#ifdef CONFIG_KVM_BOOKE_HV >> + return mfspr(SPRN_ESR); >> +#else >> + return vcpu->arch.shared->esr; >> +#endif >> +} >=20 > s/SPRN_ESR/SPRN_GESR/ Ouch :) >=20 >> int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, >> unsigned int exit_nr) >> { >> - enum emulation_result er; >> int r =3D RESUME_HOST; >>=20 >> /* update before a new last_exit_type is rewritten */ >> kvmppc_update_timing_stats(vcpu); >>=20 >> + switch (exit_nr) { >> + case BOOKE_INTERRUPT_EXTERNAL: >> + do_IRQ(current->thread.regs); >> + break; >=20 > What will current->thread.regs point to here? Something on the stack > from the last normal host exception entry? Yup. Regs only contains a few register values of volatile state that we = shouldn't care about at this point anyways, right? > We probably want to create a pt_regs on the stack and at least provide > PC, LR, and r1 for perfmon interrupts and such. We don't want guest information to leak into the host kernel, do we? = =46rom the host's POV, any exit inside the guest happens right at the = guest exit path of KVM. The way we handle this on book3s is that we actually jump right into the = real handler from asm code, right when we recovered all the MMU state. = We could do the same here - basically move this whole thing off to asm = code and jump right to the actual interrupt handler, not the above = do_IRQ implementation. Then register state in regs would also be guaranteed to be sane, since = it's created by the interrupt handler itself. >=20 >> @@ -384,30 +558,56 @@ int kvmppc_handle_exit(struct kvm_run *run, = struct kvm_vcpu *vcpu, >>=20 >> switch (exit_nr) { >> case BOOKE_INTERRUPT_MACHINE_CHECK: >> - printk("MACHINE CHECK: %lx\n", mfspr(SPRN_MCSR)); >> - kvmppc_dump_vcpu(vcpu); >> - r =3D RESUME_HOST; >> + kvm_resched(vcpu); >> + r =3D RESUME_GUEST; >> break; >=20 > Leave this bit out (proper machine check handling will come later). Ok, I'll readd it with a later patch on top that also sets = run->hw.hardware_exit_reason to something debugg'y, so user space can = abort. >=20 >> case BOOKE_INTERRUPT_PROGRAM: >> - if (vcpu->arch.shared->msr & MSR_PR) { >> + if (vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) { >> /* Program traps generated by user-level = software must be handled >> * by the guest kernel. */ >> kvmppc_core_queue_program(vcpu, = vcpu->arch.fault_esr); >=20 > Should update the comment for why we're checking GS (i.e. we get a > different trap for emulation with GS-mode). k >=20 >> +#define SET_VCPU(vcpu) \ >> + PPC_STL vcpu, (THREAD + THREAD_KVM_VCPU)(r2) >=20 > Change spaces to tab before PPC_STL The whole thing gets removed a few patches later. >=20 >> +#define LONGBYTES (BITS_PER_LONG / 8) >> + >> +#define VCPU_GPR(n) (VCPU_GPRS + (n * LONGBYTES)) >> +#define VCPU_GUEST_SPRG(n) (VCPU_GUEST_SPRGS + (n * LONGBYTES)) >> + >> +/* The host stack layout: */ >> +#define HOST_R1 (0 * LONGBYTES) /* Implied by stwu. */ >> +#define HOST_CALLEE_LR (1 * LONGBYTES) >> +#define HOST_RUN (2 * LONGBYTES) /* struct kvm_run */ >> +/* >> + * r2 is special: it holds 'current', and it made nonvolatile in the >> + * kernel with the -ffixed-r2 gcc option. >> + */ >> +#define HOST_R2 (3 * LONGBYTES) >> +#define HOST_NV_GPRS (4 * LONGBYTES) >> +#define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * LONGBYTES)) >> +#define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + LONGBYTES) >> +#define HOST_STACK_SIZE ((HOST_MIN_STACK_SIZE + 15) & ~15) /* Align. = */ >> +#define HOST_STACK_LR (HOST_STACK_SIZE + LONGBYTES) /* In caller = stack frame. */ >> + >> +#define NEED_EMU 0x00000001 /* emulation -- save nv regs = */ >> +#define NEED_DEAR 0x00000002 /* save faulting DEAR */ >> +#define NEED_ESR 0x00000004 /* save faulting ESR */ >> + >> +/* >> + * On entry: >> + * r4 =3D vcpu, r5 =3D srr0, r6 =3D srr1 >> + * saved in vcpu: cr, ctr, r3-r13 >> + */ >> +.macro kvm_handler_common intno, srr0, flags >> + mfspr r10, SPRN_PID >> + lwz r8, VCPU_HOST_PID(r4) >> + PPC_LL r11, VCPU_SHARED(r4) >> + PPC_STL r14, VCPU_GPR(r14)(r4) /* We need a non-volatile GPR. */ >> + li r14, \intno >> + >> + stw r10, VCPU_GUEST_PID(r4) >> + mtspr SPRN_PID, r8 >> + >> + .if \flags & NEED_EMU >> + lwz r9, VCPU_KVM(r4) >> + .endif >> + >> +#ifdef CONFIG_KVM_EXIT_TIMING >> + /* save exit time */ >> +1: mfspr r7, SPRN_TBRU >> + mfspr r8, SPRN_TBRL >> + mfspr r9, SPRN_TBRU >> + cmpw r9, r7 >> + PPC_STL r8, VCPU_TIMING_EXIT_TBL(r4) >> + bne- 1b >> + PPC_STL r9, VCPU_TIMING_EXIT_TBU(r4) >> +#endif >=20 > As you pointed out to me last time, r9 is clobbered if exit timing is > enabled (but see below, the load of VCPU_KVM can be removed along with > the subsequent load of LVM_LPID(r9)). Yes, fixed a few patches later. I went with the "put patches on top of = your patches" style instead of rebasing and modifying your patches too = much, so that it's easier for you guys to integrate this downstream. >=20 >> + oris r8, r6, MSR_CE@h >> +#ifndef CONFIG_64BIT >> + stw r6, (VCPU_SHARED_MSR + 4)(r11) >> +#else >> + std r6, (VCPU_SHARED_MSR)(r11) >> +#endif >> + ori r8, r8, MSR_ME | MSR_RI >> + PPC_STL r5, VCPU_PC(r4) >> + >> + /* >> + * Make sure CE/ME/RI are set (if appropriate for exception = type) >> + * whether or not the guest had it set. Since mfmsr/mtmsr are >> + * somewhat expensive, skip in the common case where the guest >> + * had all these bits set (and thus they're still set if >> + * appropriate for the exception type). >> + */ >> + cmpw r6, r8 >> + .if \flags & NEED_EMU >> + lwz r9, KVM_LPID(r9) >> + .endif >=20 > Where do we use r9? This is probably left over from something old. Looks like it, yup. Alex