linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
@ 2020-04-15  7:28 Zenghui Yu
  2020-04-20 16:10 ` Alexandru Elisei
  0 siblings, 1 reply; 7+ messages in thread
From: Zenghui Yu @ 2020-04-15  7:28 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, linux-kernel
  Cc: wanghaibin.wang, Zenghui Yu, Marc Zyngier, Christoffer Dall,
	James Morse, Julien Thierry, Suzuki K Poulose

stage2_unmap_vm() was introduced to unmap user RAM region in the stage2
page table to make the caches coherent. E.g., a guest reboot with stage1
MMU disabled will access memory using non-cacheable attributes. If the
RAM and caches are not coherent at this stage, some evicted dirty cache
line may go and corrupt guest data in RAM.

Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage
of it to configure the stage2 page table and the attributes of memory
access. So we ensure that guests always access memory using cacheable
attributes and thus, the caches always be coherent.

So on CPUs that support S2FWB, we can safely reset the vcpu without a
heavy stage2 unmapping.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
---

If this is correct, there should be a great performance improvement on
a guest reboot (or reset) on systems support S2FWB. But I'm afraid that
I've missed some points here, so please comment!

The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
was merged about six years ago and I failed to track its histroy and
intention. Instead of a whole stage2 unmapping, something like
stage2_flush_vm() looks enough to me. But again, I'm unsure...

Thanks for having a look!

 virt/kvm/arm/arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..e6378162cdef 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	/*
 	 * Ensure a rebooted VM will fault in RAM pages and detect if the
 	 * guest MMU is turned off and flush the caches as needed.
+	 *
+	 * S2FWB enforces all memory accesses to RAM being cacheable, we
+	 * ensure that the cache is always coherent.
 	 */
-	if (vcpu->arch.has_run_once)
+	if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
 		stage2_unmap_vm(vcpu->kvm);
 
 	vcpu_reset_hcr(vcpu);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-04-15  7:28 [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported Zenghui Yu
@ 2020-04-20 16:10 ` Alexandru Elisei
  2020-05-30 10:46   ` Alexandru Elisei
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Elisei @ 2020-04-20 16:10 UTC (permalink / raw)
  To: Zenghui Yu, kvmarm, linux-arm-kernel, linux-kernel; +Cc: Marc Zyngier

Hi,

On 4/15/20 8:28 AM, Zenghui Yu wrote:
> stage2_unmap_vm() was introduced to unmap user RAM region in the stage2
> page table to make the caches coherent. E.g., a guest reboot with stage1
> MMU disabled will access memory using non-cacheable attributes. If the
> RAM and caches are not coherent at this stage, some evicted dirty cache
> line may go and corrupt guest data in RAM.
>
> Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage
> of it to configure the stage2 page table and the attributes of memory
> access. So we ensure that guests always access memory using cacheable
> attributes and thus, the caches always be coherent.
>
> So on CPUs that support S2FWB, we can safely reset the vcpu without a
> heavy stage2 unmapping.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
> ---
>
> If this is correct, there should be a great performance improvement on
> a guest reboot (or reset) on systems support S2FWB. But I'm afraid that
> I've missed some points here, so please comment!
>
> The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
> was merged about six years ago and I failed to track its histroy and
> intention. Instead of a whole stage2 unmapping, something like
> stage2_flush_vm() looks enough to me. But again, I'm unsure...
>
> Thanks for having a look!

I had a chat with Christoffer about stage2_unmap_vm, and as I understood it, the
purpose was to make sure that any changes made by userspace were seen by the guest
while the MMU is off. When a stage 2 fault happens, we do clean+inval on the
dcache, or inval on the icache if it was an exec fault. This means that whatever
the host userspace writes while the guest is shut down and is still in the cache,
the guest will be able to read/execute.

This can be relevant if the guest relocates the kernel and overwrites the original
image location, and userspace copies the original kernel image back in before
restarting the vm.

>
>  virt/kvm/arm/arm.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 48d0ec44ad77..e6378162cdef 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	/*
>  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>  	 * guest MMU is turned off and flush the caches as needed.
> +	 *
> +	 * S2FWB enforces all memory accesses to RAM being cacheable, we
> +	 * ensure that the cache is always coherent.
>  	 */
> -	if (vcpu->arch.has_run_once)
> +	if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))

I think userspace does not invalidate the icache when loading a new kernel image,
and if the guest patched instructions, they could potentially still be in the
icache. Should the icache be invalidated if FWB is present?

Thanks,
Alex
>  		stage2_unmap_vm(vcpu->kvm);
>  
>  	vcpu_reset_hcr(vcpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-04-20 16:10 ` Alexandru Elisei
@ 2020-05-30 10:46   ` Alexandru Elisei
  2020-05-30 16:31     ` Marc Zyngier
  2020-06-01  3:24     ` Zenghui Yu
  0 siblings, 2 replies; 7+ messages in thread
From: Alexandru Elisei @ 2020-05-30 10:46 UTC (permalink / raw)
  To: Zenghui Yu, kvmarm, linux-arm-kernel, linux-kernel; +Cc: Marc Zyngier

Hi,

On 4/20/20 5:10 PM, Alexandru Elisei wrote:
> Hi,
>
> On 4/15/20 8:28 AM, Zenghui Yu wrote:
>> stage2_unmap_vm() was introduced to unmap user RAM region in the stage2
>> page table to make the caches coherent. E.g., a guest reboot with stage1
>> MMU disabled will access memory using non-cacheable attributes. If the
>> RAM and caches are not coherent at this stage, some evicted dirty cache
>> line may go and corrupt guest data in RAM.
>>
>> Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage
>> of it to configure the stage2 page table and the attributes of memory
>> access. So we ensure that guests always access memory using cacheable
>> attributes and thus, the caches always be coherent.
>>
>> So on CPUs that support S2FWB, we can safely reset the vcpu without a
>> heavy stage2 unmapping.
>>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Christoffer Dall <christoffer.dall@arm.com>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>
>> If this is correct, there should be a great performance improvement on
>> a guest reboot (or reset) on systems support S2FWB. But I'm afraid that
>> I've missed some points here, so please comment!
>>
>> The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
>> was merged about six years ago and I failed to track its histroy and
>> intention. Instead of a whole stage2 unmapping, something like
>> stage2_flush_vm() looks enough to me. But again, I'm unsure...
>>
>> Thanks for having a look!
> I had a chat with Christoffer about stage2_unmap_vm, and as I understood it, the
> purpose was to make sure that any changes made by userspace were seen by the guest
> while the MMU is off. When a stage 2 fault happens, we do clean+inval on the
> dcache, or inval on the icache if it was an exec fault. This means that whatever
> the host userspace writes while the guest is shut down and is still in the cache,
> the guest will be able to read/execute.
>
> This can be relevant if the guest relocates the kernel and overwrites the original
> image location, and userspace copies the original kernel image back in before
> restarting the vm.
>
>>  virt/kvm/arm/arm.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 48d0ec44ad77..e6378162cdef 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>>  	/*
>>  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>>  	 * guest MMU is turned off and flush the caches as needed.
>> +	 *
>> +	 * S2FWB enforces all memory accesses to RAM being cacheable, we
>> +	 * ensure that the cache is always coherent.
>>  	 */
>> -	if (vcpu->arch.has_run_once)
>> +	if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> I think userspace does not invalidate the icache when loading a new kernel image,
> and if the guest patched instructions, they could potentially still be in the
> icache. Should the icache be invalidated if FWB is present?

I noticed that this was included in the current pull request and I remembered that
I wasn't sure about this part. Did some more digging and it turns out that FWB
implies no cache maintenance needed for *data to instruction* coherence. From ARM
DDI 0487F.b, page D5-2635:

"When ARMv8.4-S2FWB is implemented, the architecture requires that
CLIDR_EL1.{LOUU, LOIUS} are zero so that no levels of data cache need to be
cleaned in order to manage coherency with instruction fetches".

However, there's no mention that I found for instruction to data coherence,
meaning that the icache would still need to be invalidated on each vcpu in order
to prevent fetching of patched instructions from the icache. Am I missing something?

Thanks,
Alex
>
> Thanks,
> Alex
>>  		stage2_unmap_vm(vcpu->kvm);
>>  
>>  	vcpu_reset_hcr(vcpu);
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-05-30 10:46   ` Alexandru Elisei
@ 2020-05-30 16:31     ` Marc Zyngier
  2020-05-31  9:12       ` Alexandru Elisei
  2020-06-01  6:26       ` zhukeqian
  2020-06-01  3:24     ` Zenghui Yu
  1 sibling, 2 replies; 7+ messages in thread
From: Marc Zyngier @ 2020-05-30 16:31 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Zenghui Yu, kvmarm, linux-arm-kernel, linux-kernel

Hi Alex,

On 2020-05-30 11:46, Alexandru Elisei wrote:
> Hi,

[...]

>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 48d0ec44ad77..e6378162cdef 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>>> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct 
>>> kvm_vcpu *vcpu,
>>>  	/*
>>>  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>>>  	 * guest MMU is turned off and flush the caches as needed.
>>> +	 *
>>> +	 * S2FWB enforces all memory accesses to RAM being cacheable, we
>>> +	 * ensure that the cache is always coherent.
>>>  	 */
>>> -	if (vcpu->arch.has_run_once)
>>> +	if (vcpu->arch.has_run_once && 
>>> !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>> I think userspace does not invalidate the icache when loading a new 
>> kernel image,
>> and if the guest patched instructions, they could potentially still be 
>> in the
>> icache. Should the icache be invalidated if FWB is present?
> 
> I noticed that this was included in the current pull request and I
> remembered that
> I wasn't sure about this part. Did some more digging and it turns out 
> that FWB
> implies no cache maintenance needed for *data to instruction*
> coherence. From ARM
> DDI 0487F.b, page D5-2635:
> 
> "When ARMv8.4-S2FWB is implemented, the architecture requires that
> CLIDR_EL1.{LOUU, LOIUS} are zero so that no levels of data cache need 
> to be
> cleaned in order to manage coherency with instruction fetches".
> 
> However, there's no mention that I found for instruction to data 
> coherence,
> meaning that the icache would still need to be invalidated on each vcpu 
> in order
> to prevent fetching of patched instructions from the icache. Am I
> missing something?

I think you are right, and this definitely matches the way we deal with
the icache on the fault path. For some bizarre reason, I always assume
that FWB implies DIC, which isn't true at all.

I'm planning to address it as follows. Please let me know what you 
think.

Thanks,

         M.

 From f7860d1d284f41afea176cc17e5c9d895ae665e9 Mon Sep 17 00:00:00 2001
 From: Marc Zyngier <maz@kernel.org>
Date: Sat, 30 May 2020 17:22:19 +0100
Subject: [PATCH] KVM: arm64: Flush the instruction cache if not 
unmapping the
  VM on reboot

On a system with FWB, we don't need to unmap Stage-2 on reboot,
as even if userspace takes this opportunity to repaint the whole
of memory, FWB ensures that the data side stays consistent even
if the guest uses non-cacheable mappings.

However, the I-side is not necessarily coherent with the D-side
if CTR_EL0.DIC is 0. In this case, invalidate the i-cache to
preserve coherency.

Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Fixes: 892713e97ca1 ("KVM: arm64: Sidestep stage2_unmap_vm() on vcpu 
reset when S2FWB is supported")
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
  arch/arm64/kvm/arm.c | 14 ++++++++++----
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b0b569f2cdd0..d6988401c22a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -989,11 +989,17 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct 
kvm_vcpu *vcpu,
  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
  	 * guest MMU is turned off and flush the caches as needed.
  	 *
-	 * S2FWB enforces all memory accesses to RAM being cacheable, we
-	 * ensure that the cache is always coherent.
+	 * S2FWB enforces all memory accesses to RAM being cacheable,
+	 * ensuring that the data side is always coherent. We still
+	 * need to invalidate the I-cache though, as FWB does *not*
+	 * imply CTR_EL0.DIC.
  	 */
-	if (vcpu->arch.has_run_once && 
!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
-		stage2_unmap_vm(vcpu->kvm);
+	if (vcpu->arch.has_run_once) {
+		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
+			stage2_unmap_vm(vcpu->kvm);
+		else
+			__flush_icache_all();
+	}

  	vcpu_reset_hcr(vcpu);

-- 
2.26.2


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-05-30 16:31     ` Marc Zyngier
@ 2020-05-31  9:12       ` Alexandru Elisei
  2020-06-01  6:26       ` zhukeqian
  1 sibling, 0 replies; 7+ messages in thread
From: Alexandru Elisei @ 2020-05-31  9:12 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Zenghui Yu, kvmarm, linux-arm-kernel, linux-kernel

Hi Marc,

On 5/30/20 5:31 PM, Marc Zyngier wrote:
> Hi Alex,
>
> On 2020-05-30 11:46, Alexandru Elisei wrote:
>> Hi,
>
> [...]
>
>>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>>> index 48d0ec44ad77..e6378162cdef 100644
>>>> --- a/virt/kvm/arm/arm.c
>>>> +++ b/virt/kvm/arm/arm.c
>>>> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu
>>>> *vcpu,
>>>>      /*
>>>>       * Ensure a rebooted VM will fault in RAM pages and detect if the
>>>>       * guest MMU is turned off and flush the caches as needed.
>>>> +     *
>>>> +     * S2FWB enforces all memory accesses to RAM being cacheable, we
>>>> +     * ensure that the cache is always coherent.
>>>>       */
>>>> -    if (vcpu->arch.has_run_once)
>>>> +    if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>>> I think userspace does not invalidate the icache when loading a new kernel image,
>>> and if the guest patched instructions, they could potentially still be in the
>>> icache. Should the icache be invalidated if FWB is present?
>>
>> I noticed that this was included in the current pull request and I
>> remembered that
>> I wasn't sure about this part. Did some more digging and it turns out that FWB
>> implies no cache maintenance needed for *data to instruction*
>> coherence. From ARM
>> DDI 0487F.b, page D5-2635:
>>
>> "When ARMv8.4-S2FWB is implemented, the architecture requires that
>> CLIDR_EL1.{LOUU, LOIUS} are zero so that no levels of data cache need to be
>> cleaned in order to manage coherency with instruction fetches".
>>
>> However, there's no mention that I found for instruction to data coherence,
>> meaning that the icache would still need to be invalidated on each vcpu in order
>> to prevent fetching of patched instructions from the icache. Am I
>> missing something?
>
> I think you are right, and this definitely matches the way we deal with
> the icache on the fault path. For some bizarre reason, I always assume
> that FWB implies DIC, which isn't true at all.
>
> I'm planning to address it as follows. Please let me know what you think.
>
> Thanks,
>
>         M.
>
> From f7860d1d284f41afea176cc17e5c9d895ae665e9 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@kernel.org>
> Date: Sat, 30 May 2020 17:22:19 +0100
> Subject: [PATCH] KVM: arm64: Flush the instruction cache if not unmapping the
>  VM on reboot
>
> On a system with FWB, we don't need to unmap Stage-2 on reboot,
> as even if userspace takes this opportunity to repaint the whole
> of memory, FWB ensures that the data side stays consistent even
> if the guest uses non-cacheable mappings.
>
> However, the I-side is not necessarily coherent with the D-side
> if CTR_EL0.DIC is 0. In this case, invalidate the i-cache to
> preserve coherency.
>
> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: 892713e97ca1 ("KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when
> S2FWB is supported")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arm.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b0b569f2cdd0..d6988401c22a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -989,11 +989,17 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu
> *vcpu,
>       * Ensure a rebooted VM will fault in RAM pages and detect if the
>       * guest MMU is turned off and flush the caches as needed.
>       *
> -     * S2FWB enforces all memory accesses to RAM being cacheable, we
> -     * ensure that the cache is always coherent.
> +     * S2FWB enforces all memory accesses to RAM being cacheable,
> +     * ensuring that the data side is always coherent. We still
> +     * need to invalidate the I-cache though, as FWB does *not*
> +     * imply CTR_EL0.DIC.
>       */
> -    if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> -        stage2_unmap_vm(vcpu->kvm);
> +    if (vcpu->arch.has_run_once) {
> +        if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> +            stage2_unmap_vm(vcpu->kvm);
> +        else
> +            __flush_icache_all();
> +    }
>
>      vcpu_reset_hcr(vcpu);
>
>
Looks good, __flush_icache_all checks CTR_EL0.DIC before doing icache maintenance:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-05-30 10:46   ` Alexandru Elisei
  2020-05-30 16:31     ` Marc Zyngier
@ 2020-06-01  3:24     ` Zenghui Yu
  1 sibling, 0 replies; 7+ messages in thread
From: Zenghui Yu @ 2020-06-01  3:24 UTC (permalink / raw)
  To: Alexandru Elisei, kvmarm, linux-arm-kernel, linux-kernel; +Cc: Marc Zyngier

Hi Alex,

On 2020/5/30 18:46, Alexandru Elisei wrote:
> Hi,
> 
> On 4/20/20 5:10 PM, Alexandru Elisei wrote:

[ For some unknown reasons, I had missed your reply one month ago.
   Sorry, I'm going to fix my email settings ... ]

>> Hi,
>>
>> On 4/15/20 8:28 AM, Zenghui Yu wrote:
>>> stage2_unmap_vm() was introduced to unmap user RAM region in the stage2
>>> page table to make the caches coherent. E.g., a guest reboot with stage1
>>> MMU disabled will access memory using non-cacheable attributes. If the
>>> RAM and caches are not coherent at this stage, some evicted dirty cache
>>> line may go and corrupt guest data in RAM.
>>>
>>> Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage
>>> of it to configure the stage2 page table and the attributes of memory
>>> access. So we ensure that guests always access memory using cacheable
>>> attributes and thus, the caches always be coherent.
>>>
>>> So on CPUs that support S2FWB, we can safely reset the vcpu without a
>>> heavy stage2 unmapping.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Christoffer Dall <christoffer.dall@arm.com>
>>> Cc: James Morse <james.morse@arm.com>
>>> Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>>> ---
>>>
>>> If this is correct, there should be a great performance improvement on
>>> a guest reboot (or reset) on systems support S2FWB. But I'm afraid that
>>> I've missed some points here, so please comment!
>>>
>>> The commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
>>> was merged about six years ago and I failed to track its histroy and
>>> intention. Instead of a whole stage2 unmapping, something like
>>> stage2_flush_vm() looks enough to me. But again, I'm unsure...
>>>
>>> Thanks for having a look!
>> I had a chat with Christoffer about stage2_unmap_vm, and as I understood it, the
>> purpose was to make sure that any changes made by userspace were seen by the guest
>> while the MMU is off. When a stage 2 fault happens, we do clean+inval on the
>> dcache, or inval on the icache if it was an exec fault. This means that whatever
>> the host userspace writes while the guest is shut down and is still in the cache,
>> the guest will be able to read/execute.
>>
>> This can be relevant if the guest relocates the kernel and overwrites the original
>> image location, and userspace copies the original kernel image back in before
>> restarting the vm.

Yes, I-cache coherency is what I had missed! So without a S2 unmapping
on reboot, if there's any stale and "valid" cache line in the I-cache,
guest may fetch the wrong instructions directly from it, and bad things
will happen... (We will otherwise get a translation fault and a
permission fault and invalidate the I-cache as needed.)

>>
>>>   virt/kvm/arm/arm.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 48d0ec44ad77..e6378162cdef 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>>> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>>>   	/*
>>>   	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>>>   	 * guest MMU is turned off and flush the caches as needed.
>>> +	 *
>>> +	 * S2FWB enforces all memory accesses to RAM being cacheable, we
>>> +	 * ensure that the cache is always coherent.
>>>   	 */
>>> -	if (vcpu->arch.has_run_once)
>>> +	if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>> I think userspace does not invalidate the icache when loading a new kernel image,
>> and if the guest patched instructions, they could potentially still be in the
>> icache. Should the icache be invalidated if FWB is present?
> 
> I noticed that this was included in the current pull request and I remembered that
> I wasn't sure about this part. Did some more digging and it turns out that FWB
> implies no cache maintenance needed for *data to instruction* coherence. From ARM
> DDI 0487F.b, page D5-2635:
> 
> "When ARMv8.4-S2FWB is implemented, the architecture requires that
> CLIDR_EL1.{LOUU, LOIUS} are zero so that no levels of data cache need to be
> cleaned in order to manage coherency with instruction fetches".
> 
> However, there's no mention that I found for instruction to data coherence,
> meaning that the icache would still need to be invalidated on each vcpu in order
> to prevent fetching of patched instructions from the icache. Am I missing something?

Thanks for the head up and Marc's fix!


Thanks both,
Zenghui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported
  2020-05-30 16:31     ` Marc Zyngier
  2020-05-31  9:12       ` Alexandru Elisei
@ 2020-06-01  6:26       ` zhukeqian
  1 sibling, 0 replies; 7+ messages in thread
From: zhukeqian @ 2020-06-01  6:26 UTC (permalink / raw)
  To: Marc Zyngier, Alexandru Elisei
  Cc: Zenghui Yu, kvmarm, linux-arm-kernel, linux-kernel

Hi Marc,

On 2020/5/31 0:31, Marc Zyngier wrote:
> Hi Alex,
> 
> On 2020-05-30 11:46, Alexandru Elisei wrote:
>> Hi,
> 
> [...]
> 
>>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>>> index 48d0ec44ad77..e6378162cdef 100644
>>>> --- a/virt/kvm/arm/arm.c
>>>> +++ b/virt/kvm/arm/arm.c
>>>> @@ -983,8 +983,11 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>>>>      /*
>>>>       * Ensure a rebooted VM will fault in RAM pages and detect if the
>>>>       * guest MMU is turned off and flush the caches as needed.
>>>> +     *
>>>> +     * S2FWB enforces all memory accesses to RAM being cacheable, we
>>>> +     * ensure that the cache is always coherent.
>>>>       */
>>>> -    if (vcpu->arch.has_run_once)
>>>> +    if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>>> I think userspace does not invalidate the icache when loading a new kernel image,
>>> and if the guest patched instructions, they could potentially still be in the
>>> icache. Should the icache be invalidated if FWB is present?
>>
>> I noticed that this was included in the current pull request and I
>> remembered that
>> I wasn't sure about this part. Did some more digging and it turns out that FWB
>> implies no cache maintenance needed for *data to instruction*
>> coherence. From ARM
>> DDI 0487F.b, page D5-2635:
>>
>> "When ARMv8.4-S2FWB is implemented, the architecture requires that
>> CLIDR_EL1.{LOUU, LOIUS} are zero so that no levels of data cache need to be
>> cleaned in order to manage coherency with instruction fetches".
>>
>> However, there's no mention that I found for instruction to data coherence,
>> meaning that the icache would still need to be invalidated on each vcpu in order
>> to prevent fetching of patched instructions from the icache. Am I
>> missing something?
> 
> I think you are right, and this definitely matches the way we deal with
> the icache on the fault path. For some bizarre reason, I always assume
> that FWB implies DIC, which isn't true at all.
> 
> I'm planning to address it as follows. Please let me know what you think.
> 
> Thanks,
> 
>         M.
> 
> From f7860d1d284f41afea176cc17e5c9d895ae665e9 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@kernel.org>
> Date: Sat, 30 May 2020 17:22:19 +0100
> Subject: [PATCH] KVM: arm64: Flush the instruction cache if not unmapping the
>  VM on reboot
> 
> On a system with FWB, we don't need to unmap Stage-2 on reboot,
> as even if userspace takes this opportunity to repaint the whole
> of memory, FWB ensures that the data side stays consistent even
> if the guest uses non-cacheable mappings.
> 
> However, the I-side is not necessarily coherent with the D-side
> if CTR_EL0.DIC is 0. In this case, invalidate the i-cache to
> preserve coherency.
> 
> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: 892713e97ca1 ("KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arm.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b0b569f2cdd0..d6988401c22a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -989,11 +989,17 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>       * Ensure a rebooted VM will fault in RAM pages and detect if the
>       * guest MMU is turned off and flush the caches as needed.
>       *
> -     * S2FWB enforces all memory accesses to RAM being cacheable, we
> -     * ensure that the cache is always coherent.
> +     * S2FWB enforces all memory accesses to RAM being cacheable,
> +     * ensuring that the data side is always coherent. We still
> +     * need to invalidate the I-cache though, as FWB does *not*
> +     * imply CTR_EL0.DIC.
>       */
> -    if (vcpu->arch.has_run_once && !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> -        stage2_unmap_vm(vcpu->kvm);
> +    if (vcpu->arch.has_run_once) {
> +        if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> +            stage2_unmap_vm(vcpu->kvm);
> +        else
> +            __flush_icache_all();
After I looking into this function, I think it's OK here. Please ignore
my question :-).
> +    }
> 
>      vcpu_reset_hcr(vcpu);
> 
Thanks,
Keqian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-01  6:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15  7:28 [PATCH RFC] KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported Zenghui Yu
2020-04-20 16:10 ` Alexandru Elisei
2020-05-30 10:46   ` Alexandru Elisei
2020-05-30 16:31     ` Marc Zyngier
2020-05-31  9:12       ` Alexandru Elisei
2020-06-01  6:26       ` zhukeqian
2020-06-01  3:24     ` Zenghui Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).