On 7/18/19 1:36 PM, Alexandru Elisei wrote: > On 7/18/19 1:13 PM, Tomasz Nowicki wrote: > >> Hello Alex, >> >> On 09.07.2019 15:20, Alexandru Elisei wrote: >>> On 6/21/19 10:38 AM, Marc Zyngier wrote: >>>> From: Jintack Lim >>>> >>>> When supporting nested virtualization a guest hypervisor executing AT >>>> instructions must be trapped and emulated by the host hypervisor, >>>> because untrapped AT instructions operating on S1E1 will use the wrong >>>> translation regieme (the one used to emulate virtual EL2 in EL1 instead >>> I think that should be "regime". >>> >>>> of virtual EL1) and AT instructions operating on S12 will not work from >>>> EL1. >>>> >>>> This patch does several things. >>>> >>>> 1. List and define all AT system instructions to emulate and document >>>> the emulation design. >>>> >>>> 2. Implement AT instruction handling logic in EL2. This will be used to >>>> emulate AT instructions executed in the virtual EL2. >>>> >>>> AT instruction emulation works by loading the proper processor >>>> context, which depends on the trapped instruction and the virtual >>>> HCR_EL2, to the EL1 virtual memory control registers and executing AT >>>> instructions. Note that ctxt->hw_sys_regs is expected to have the >>>> proper processor context before calling the handling >>>> function(__kvm_at_insn) implemented in this patch. >>>> >>>> 4. Emulate AT S1E[01] instructions by issuing the same instructions in >>>> EL2. We set the physical EL1 registers, NV and NV1 bits as described in >>>> the AT instruction emulation overview. >>> Is item number 3 missing, or is that the result of an unfortunate typo? >>> >>>> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1 >>>> translation by reusing the existing AT emulation functions. Second, do >>>> the stage-2 translation by walking the guest hypervisor's stage-2 page >>>> table in software. Record the translation result to PAR_EL1. >>>> >>>> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1 >>>> instructions in EL2. We set the physical EL1 registers and the HCR_EL2 >>>> register as described in the AT instruction emulation overview. >>>> >>>> 7. Forward system instruction traps to the virtual EL2 if the corresponding >>>> virtual AT bit is set in the virtual HCR_EL2. >>>> >>>> [ Much logic above has been reworked by Marc Zyngier ] >>>> >>>> Signed-off-by: Jintack Lim >>>> Signed-off-by: Marc Zyngier >>>> Signed-off-by: Christoffer Dall >>>> --- >>>> arch/arm64/include/asm/kvm_arm.h | 2 + >>>> arch/arm64/include/asm/kvm_asm.h | 2 + >>>> arch/arm64/include/asm/sysreg.h | 17 +++ >>>> arch/arm64/kvm/hyp/Makefile | 1 + >>>> arch/arm64/kvm/hyp/at.c | 217 +++++++++++++++++++++++++++++++ >>>> arch/arm64/kvm/hyp/switch.c | 13 +- >>>> arch/arm64/kvm/sys_regs.c | 202 +++++++++++++++++++++++++++- >>>> 7 files changed, 450 insertions(+), 4 deletions(-) >>>> create mode 100644 arch/arm64/kvm/hyp/at.c >>>> >> [...] >> >>>> + >>>> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) >>>> +{ >>>> + struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt; >>>> + struct mmu_config config; >>>> + struct kvm_s2_mmu *mmu; >>>> + >>>> + /* >>>> + * We can only get here when trapping from vEL2, so we're >>>> + * translating a guest guest VA. >>>> + * >>>> + * FIXME: Obtaining the S2 MMU for a a guest guest is horribly >>>> + * racy, and we may not find it. >>>> + */ >>>> + spin_lock(&vcpu->kvm->mmu_lock); >>>> + >>>> + mmu = lookup_s2_mmu(vcpu->kvm, >>>> + vcpu_read_sys_reg(vcpu, VTTBR_EL2), >>>> + vcpu_read_sys_reg(vcpu, HCR_EL2)); >>> From ARM DDI 0487D.b, the description for AT S1E1R (page C5-467, it's the same >>> for the other at s1e{0,1}* instructions): >>> >>> [..] Performs stage 1 address translation, with permisions as if reading from >>> the given virtual address from EL1, or from EL2 [..], using the following >>> translation regime: >>> - If HCR_EL2.{E2H,TGE} is {1, 1}, the EL2&0 translation regime, accessed from EL2. >>> >>> If the guest is VHE, I don't think there's any need to switch mmus. The AT >>> instruction will use the physical EL1&0 translation regime already on the >>> hardware (assuming host HCR_EL2.TGE == 0), which is the vEL2&0 regime for the >>> guest hypervisor. >> Here we want to run AT for L2 (guest guest) EL1&0 regime and not the L1 >> (guest hypervisor) so we have to lookup and switch to nested VM MMU >> context. Or did I miss your point? >> >> Thanks, >> Tomasz > What I mean to say is that if the L1 guest has set HCR_EL2.{E2H, TGE} = {1, 1}, then the instruction affects the vEL2&0 translation regime (as per the instruction description in the arhitecture), which is already loaded. The AT instruction will affect the L1 guest hypervisor, not the L2 guest. > > In other words: > > if (!vcpu_el2_e2h_is_set(vcpu) || !vcpu_el2_tge_is_set(vcpu)) > /* switch mmus, the instruction affects the L2 guest (the guest guest) */ > else > /* do not switch mmus, the instruction affects the L1 guest hypervisor which is loaded */ > > I hope this makes things clearer. > > I realized where the confusion comes from (nested virtualization is hard). Let me rephrase it again, maybe this time I will get it right. What I mean to say is that if the L1 guest has set HCR_EL2.{E2H, TGE} = {1, 1}, then the instruction uses the vEL2&0 translation regime (as per the instruction description in the architecture), meaning it uses the translation regime for the L1 guest hypervisor, and the stage 2 for that regime is already loaded. In other words: if (!vcpu_el2_e2h_is_set(vcpu) || !vcpu_el2_tge_is_set(vcpu)) /* the instruction affects the L2 guest (the guest guest), find the stage 2 mmu associated with that guest */ else /* the instruction affects the L1 guest hypervisor, the stage 2 mmu for it is already loaded */ Thanks, Alex