kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: James Morse <james.morse@arm.com>, Marc Zyngier <maz@kernel.org>
Cc: kvm@vger.kernel.org, Andre Przywara <andre.przywara@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	George Cherian <gcherian@marvell.com>,
	"Zengtao \(B\)" <prime.zeng@hisilicon.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
Date: Tue, 12 May 2020 16:47:55 +0100	[thread overview]
Message-ID: <5134e123-18ec-9b69-2e0a-b83798e01507@arm.com> (raw)
In-Reply-To: <76d811eb-b304-c49f-1f21-fe9d95112a28@arm.com>

Hi,

On 5/12/20 12:17 PM, James Morse wrote:
> Hi Alex, Marc,
>
> (just on this last_vcpu_ran thing...)
>
> On 11/05/2020 17:38, Alexandru Elisei wrote:
>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>
>>> As we are about to reuse our stage 2 page table manipulation code for
>>> shadow stage 2 page tables in the context of nested virtualization, we
>>> are going to manage multiple stage 2 page tables for a single VM.
>>>
>>> This requires some pretty invasive changes to our data structures,
>>> which moves the vmid and pgd pointers into a separate structure and
>>> change pretty much all of our mmu code to operate on this structure
>>> instead.
>>>
>>> The new structure is called struct kvm_s2_mmu.
>>>
>>> There is no intended functional change by this patch alone.
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>  	u32    vmid;
>>>  };
>>>  
>>> -struct kvm_arch {
>>> +struct kvm_s2_mmu {
>>>  	struct kvm_vmid vmid;
>>>  
>>> -	/* stage2 entry level table */
>>> -	pgd_t *pgd;
>>> -	phys_addr_t pgd_phys;
>>> -
>>> -	/* VTCR_EL2 value for this VM */
>>> -	u64    vtcr;
>>> +	/*
>>> +	 * stage2 entry level table
>>> +	 *
>>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>>> +	 * here.  This happens when running a non-VHE guest hypervisor which
>>> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>>> +	 * with vHCR_EL2.VM == 0.
>> It makes more sense to me to say that a non-VHE guest hypervisor will use the
>> canonical stage *1* page table when running at EL2
> Can KVM say anything about stage1? Its totally under the the guests control even at vEL2...

It is. My interpretation of the comment was that if the guest doesn't have virtual
stage 2 enabled (we're not running a guest of the L1 hypervisor), then the L0 host
can use the same L0 stage 2 tables because we're running the same guest (the L1
VM), regardless of the actual exception level for the guest. If I remember
correctly, KVM assigns different vmids for guests running at vEL1/0 and vEL2 with
vHCR_EL2.VM == 0 because the translation regimes are different, but keeps the same
translation tables.

>
>
>> (the "Non-secure EL2 translation regime" as ARM DDI 0487F.b calls it on page D5-2543).
>> I think that's
>> the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's
>> been quite some time since I reviewed the initial version of the NV patches, did I
>> get that wrong?
>
>>> +	 */
>>> +	pgd_t		*pgd;
>>> +	phys_addr_t	pgd_phys;
>>>  
>>>  	/* The last vcpu id that ran on each physical CPU */
>>>  	int __percpu *last_vcpu_ran;
>> It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling
>> to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I
>> don't see this change in v1 or v2 of the NV series.
> Marc may have a better rationale. My thinking was because kvm_vmid is in here too.
>
> last_vcpu_ran exists to prevent KVM accidentally emulating CNP without the opt-in. (we
> call it defacto CNP).
>
> The guest may expect to be able to use asid-4 with different page tables on different

I'm afraid I don't know what asid-4 is.

> vCPUs, assuming the TLB isn't shared. But if KVM is switching between those vCPU on one
> physical CPU, the TLB is shared, ... the VMID and ASID are the same, but the page tables
> are not. Not fun to debug!
>
>
> NV makes this problem per-stage2, because each stage2 has its own VMID, we need to track
> the vcpu_id that last ran this stage2 on this physical CPU. If its not the same, we need
> to blow away this VMIDs TLB entries.
>
> The workaround lives in virt/kvm/arm/arm.c::kvm_arch_vcpu_load()

Makes sense, thank you for explaining that.

Thanks,
Alex
>
>
>> More below.
> (lightly trimmed!)
>
> Thanks,
>
> James
>
>
>>>  
>>> +	struct kvm *kvm;
>>> +};
> [...]
>
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 53b3ba9173ba7..03f01fcfa2bd5 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>> There's a comment that still mentions arch.vmid that you missed in this file:
>>
>> static bool need_new_vmid_gen(struct kvm_vmid *vmid)
>> {
>>     u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
>>     smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
>>
> [..]
>
>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>> index e3b9ee268823b..2f99749048285 100644
>>> --- a/virt/kvm/arm/mmu.c
>>> +++ b/virt/kvm/arm/mmu.c
>>> @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>>>  }
>>>  
>>>  /**
>>> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
>>> - * @kvm:	The KVM struct pointer for the VM.
>>> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
>>> + * @kvm:	The pointer to the KVM structure
>>> + * @mmu:	The pointer to the s2 MMU structure
>>>   *
>>>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
>>> - * stage2_pgd_size(kvm).
>>> + * stage2_pgd_size(mmu->kvm).
>>>   *
>>>   * Note we don't need locking here as this is only called when the VM is
>>>   * created, which can only be done once.
>>>   */
>>> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>>>  {
>>>  	phys_addr_t pgd_phys;
>>>  	pgd_t *pgd;
>>> +	int cpu;
>>>  
>>> -	if (kvm->arch.pgd != NULL) {
>>> +	if (mmu->pgd != NULL) {
>>>  		kvm_err("kvm_arch already initialized?\n");
>>>  		return -EINVAL;
>>>  	}
>>> @@ -914,8 +928,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>>  	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
>> We don't free the pgd here, but we do free it if alloc_percpu fails. Is that
>> intentional?
>
>>>  		return -EINVAL;
>>>  
>>> -	kvm->arch.pgd = pgd;
>>> -	kvm->arch.pgd_phys = pgd_phys;
>>> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
>>> +	if (!mmu->last_vcpu_ran) {
>>> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	for_each_possible_cpu(cpu)
>>> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
>>> +
>>> +	mmu->kvm = kvm;
>>> +	mmu->pgd = pgd;
>>> +	mmu->pgd_phys = pgd_phys;
>>> +	mmu->vmid.vmid_gen = 0;
>>> +
>>>  	return 0;
>>>  }
>>>  
>>> @@ -986,39 +1012,34 @@ void stage2_unmap_vm(struct kvm *kvm)
>>>  	srcu_read_unlock(&kvm->srcu, idx);
>>>  }
>>>  
>>> -/**
>>> - * kvm_free_stage2_pgd - free all stage-2 tables
>>> - * @kvm:	The KVM struct pointer for the VM.
>>> - *
>>> - * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
>>> - * underlying level-2 and level-3 tables before freeing the actual level-1 table
>>> - * and setting the struct pointer to NULL.
>>> - */
>>> -void kvm_free_stage2_pgd(struct kvm *kvm)
>>> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>>  {
>>> +	struct kvm *kvm = mmu->kvm;
>>>  	void *pgd = NULL;
>>>  
>>>  	spin_lock(&kvm->mmu_lock);
>>> -	if (kvm->arch.pgd) {
>>> -		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
>>> -		pgd = READ_ONCE(kvm->arch.pgd);
>>> -		kvm->arch.pgd = NULL;
>>> -		kvm->arch.pgd_phys = 0;
>>> +	if (mmu->pgd) {
>>> +		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
>>> +		pgd = READ_ONCE(mmu->pgd);
>>> +		mmu->pgd = NULL;
>> The kvm->arch.pgd_phys = 0 instruction seems to have been dropped here. Is that
>> intentional?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2020-05-12 15:47 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
2020-04-22 13:40   ` Suzuki K Poulose
2020-04-22 14:07     ` Marc Zyngier
2020-04-22 14:14       ` Suzuki K Poulose
2020-05-07 11:42   ` Alexandru Elisei
2020-04-22 12:00 ` [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
2020-04-22 13:51   ` Suzuki K Poulose
2020-04-22 13:59     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
2020-05-05 15:26   ` Andrew Scull
2020-05-05 16:32     ` Marc Zyngier
2020-05-05 17:23       ` Andrew Scull
2020-05-05 18:10         ` Marc Zyngier
2020-05-05 16:03   ` James Morse
2020-05-05 17:59     ` Marc Zyngier
2020-05-06  9:30       ` Marc Zyngier
2020-05-11 16:38   ` Alexandru Elisei
2020-05-12 11:17     ` James Morse
2020-05-12 15:47       ` Alexandru Elisei [this message]
2020-05-12 16:13         ` James Morse
2020-05-12 16:53       ` Alexandru Elisei
2020-05-27  8:41         ` Marc Zyngier
2020-05-27  8:45           ` Alexandru Elisei
2020-04-22 12:00 ` [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature Marc Zyngier
2020-04-27 15:55   ` Suzuki K Poulose
2020-04-22 12:00 ` [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors Marc Zyngier
2020-05-05 15:59   ` Andrew Scull
2020-05-06  9:39     ` Marc Zyngier
2020-05-06 10:11       ` Andrew Scull
2020-04-22 12:00 ` [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper Marc Zyngier
2020-05-05 17:16   ` Andrew Scull
2020-05-06  8:05     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa Marc Zyngier
2020-05-07 15:08   ` Andrew Scull
2020-05-07 15:13     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
2020-05-07 15:13   ` Andrew Scull
2020-05-12 12:04     ` James Morse
2020-05-13  9:06       ` Andrew Scull
2020-05-27  8:59         ` Marc Zyngier
2020-05-12 17:26   ` James Morse
2020-04-22 12:00 ` [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
2020-05-07 16:26   ` James Morse
2020-05-08 12:20     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
2020-05-26 16:28   ` James Morse
2020-05-27 10:04     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-04-22 12:00 ` [PATCH 12/26] KVM: arm64: Move sysreg reset check to boot time Marc Zyngier
2020-04-22 12:00 ` [PATCH 13/26] KVM: arm64: Introduce accessor for ctxt->sys_reg Marc Zyngier
2020-04-22 12:00 ` [PATCH 14/26] KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access Marc Zyngier
2020-04-22 12:00 ` [PATCH 15/26] KVM: arm64: sve: Use __vcpu_sys_reg() " Marc Zyngier
2020-04-22 12:00 ` [PATCH 16/26] KVM: arm64: pauth: Use ctxt_sys_reg() " Marc Zyngier
2020-04-22 12:00 ` [PATCH 17/26] KVM: arm64: debug: " Marc Zyngier
2020-04-22 12:00 ` [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state Marc Zyngier
2020-04-24  4:07   ` Zenghui Yu
2020-04-24  7:45     ` Marc Zyngier
2020-04-28  1:34       ` Zengtao (B)
2020-04-22 12:00 ` [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-05-27 10:22     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-05-27 10:36     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 21/26] KVM: arm64: Move SP_EL1 " Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-04-22 12:00 ` [PATCH 22/26] KVM: arm64: Disintegrate SPSR array Marc Zyngier
2020-05-26 16:30   ` James Morse
2020-04-22 12:00 ` [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array Marc Zyngier
2020-05-26 16:30   ` James Morse
2020-04-22 12:00 ` [PATCH 24/26] KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user Marc Zyngier
2020-04-22 12:00 ` [PATCH 25/26] KVM: arm64: timers: Move timer registers to the sys_regs file Marc Zyngier
2020-04-22 12:00 ` [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL Marc Zyngier
2020-05-19 10:44   ` Mark Rutland
2020-05-27  9:34     ` Marc Zyngier
2020-05-27 14:41       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5134e123-18ec-9b69-2e0a-b83798e01507@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gcherian@marvell.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).