All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
@ 2013-08-14 11:47 Pranavkumar Sawargaonkar
  2013-08-14 12:04 ` Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Pranavkumar Sawargaonkar @ 2013-08-14 11:47 UTC (permalink / raw)
  To: linux-arm-kernel

Systems with large external L3-cache (few MBs), might have dirty
content belonging to the guest page in L3-cache. To tackle this,
we need to flush such dirty content from d-cache so that guest
will see correct contents of guest page when guest MMU is disabled.

The patch fixes coherent_icache_guest_page() for external L3-cache.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm64/include/asm/kvm_mmu.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index efe609c..5129038 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
 	if (!icache_is_aliasing()) {		/* PIPT */
 		unsigned long hva = gfn_to_hva(kvm, gfn);
 		flush_icache_range(hva, hva + PAGE_SIZE);
+		/* Flush d-cache for systems with external caches. */
+		__flush_dcache_area((void *) hva, PAGE_SIZE);
 	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
 		/* any kind of VIPT cache */
 		__flush_icache_all();
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
@ 2013-08-14 12:04 ` Marc Zyngier
  2013-08-14 14:22   ` Anup Patel
  2013-08-14 14:23 ` Sudeep KarkadaNagesha
  2013-08-14 17:37 ` Christoffer Dall
  2 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-14 12:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pranav,

On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
>
> The patch fixes coherent_icache_guest_page() for external L3-cache.
>
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void
> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */
> +		__flush_dcache_area((void *) hva, PAGE_SIZE);
>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();

[adding Will to the discussion as we talked about this in the past]

That's definitely an issue, but I'm not sure the fix is to hit the data 
cache on each page mapping. It looks overkill.

Wouldn't it be enough to let userspace do the cache cleaning? kvmtools 
knows which bits of the guest memory have been touched, and can do a "DC 
DVAC" on this region.

The alternative is do it in the kernel before running any vcpu - but 
that's not very nice either (have to clean the whole of the guest 
memory, which makes a full dcache clean more appealing).

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 12:04 ` Marc Zyngier
@ 2013-08-14 14:22   ` Anup Patel
  2013-08-14 15:06     ` Alexander Graf
                       ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-14 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Pranav,
>
> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> Systems with large external L3-cache (few MBs), might have dirty
>> content belonging to the guest page in L3-cache. To tackle this,
>> we need to flush such dirty content from d-cache so that guest
>> will see correct contents of guest page when guest MMU is disabled.
>>
>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index efe609c..5129038 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -123,6 +123,8 @@ static inline void
>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>       if (!icache_is_aliasing()) {            /* PIPT */
>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> +             /* Flush d-cache for systems with external caches. */
>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>               /* any kind of VIPT cache */
>>               __flush_icache_all();
>
> [adding Will to the discussion as we talked about this in the past]
>
> That's definitely an issue, but I'm not sure the fix is to hit the data
> cache on each page mapping. It looks overkill.
>
> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
> knows which bits of the guest memory have been touched, and can do a "DC
> DVAC" on this region.

It seems a bit unnatural to have cache cleaning is user-space. I am sure
other architectures don't have such cache cleaning in user-space for KVM.

>
> The alternative is do it in the kernel before running any vcpu - but
> that's not very nice either (have to clean the whole of the guest
> memory, which makes a full dcache clean more appealing).

Actually, cleaning full d-cache by set/way upon first run of VCPU was
our second option but current approach seemed very simple hence
we went for this.

If more people vote for full d-cache clean upon first run of VCPU then
we should revise this patch.

>
>          M.
> --
> Fast, cheap, reliable. Pick two.
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
  2013-08-14 12:04 ` Marc Zyngier
@ 2013-08-14 14:23 ` Sudeep KarkadaNagesha
  2013-08-14 14:35   ` Anup Patel
  2013-08-14 17:37 ` Christoffer Dall
  2 siblings, 1 reply; 58+ messages in thread
From: Sudeep KarkadaNagesha @ 2013-08-14 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/08/13 12:47, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
> 
> The patch fixes coherent_icache_guest_page() for external L3-cache.
> 

I don't understand KVM but the commit message caught my attention.

You are referring to clean the external/outer L3 cache but using
internal cache maintenance APIs. You should be using outer_cache APIs if
you really intend to do outer cache maintenance ?
Is the external/outer L3 cache not unified ?

Or did you mean to flush d-cache to the point of coherency(external L3
in your case)? If so it's not clear from the commit log.

Regards,
Sudeep

> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */
> +		__flush_dcache_area((void *) hva, PAGE_SIZE);
>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 14:23 ` Sudeep KarkadaNagesha
@ 2013-08-14 14:35   ` Anup Patel
  0 siblings, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-14 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 7:53 PM, Sudeep KarkadaNagesha
<Sudeep.KarkadaNagesha@arm.com> wrote:
> On 14/08/13 12:47, Pranavkumar Sawargaonkar wrote:
>> Systems with large external L3-cache (few MBs), might have dirty
>> content belonging to the guest page in L3-cache. To tackle this,
>> we need to flush such dirty content from d-cache so that guest
>> will see correct contents of guest page when guest MMU is disabled.
>>
>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>
>
> I don't understand KVM but the commit message caught my attention.
>
> You are referring to clean the external/outer L3 cache but using
> internal cache maintenance APIs. You should be using outer_cache APIs if
> you really intend to do outer cache maintenance ?
> Is the external/outer L3 cache not unified ?
>
> Or did you mean to flush d-cache to the point of coherency(external L3
> in your case)? If so it's not clear from the commit log.

Yes, we mean to flush d-cache to the point of coherency (which in our
case is external L3-cache).

The patch description is more in-terms of KVM virtualization but, we
can certainly add more details.

>
> Regards,
> Sudeep
>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index efe609c..5129038 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>       if (!icache_is_aliasing()) {            /* PIPT */
>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> +             /* Flush d-cache for systems with external caches. */
>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>               /* any kind of VIPT cache */
>>               __flush_icache_all();
>>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 14:22   ` Anup Patel
@ 2013-08-14 15:06     ` Alexander Graf
  2013-08-14 15:34       ` Marc Zyngier
  2013-08-14 15:23     ` Marc Zyngier
  2013-08-15  4:52     ` Marc Zyngier
  2 siblings, 1 reply; 58+ messages in thread
From: Alexander Graf @ 2013-08-14 15:06 UTC (permalink / raw)
  To: linux-arm-kernel


On 14.08.2013, at 16:22, Anup Patel wrote:

> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> Hi Pranav,
>> 
>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> Systems with large external L3-cache (few MBs), might have dirty
>>> content belonging to the guest page in L3-cache. To tackle this,
>>> we need to flush such dirty content from d-cache so that guest
>>> will see correct contents of guest page when guest MMU is disabled.
>>> 
>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>> 
>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> ---
>>> arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>> 1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> index efe609c..5129038 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -123,6 +123,8 @@ static inline void
>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>      if (!icache_is_aliasing()) {            /* PIPT */
>>>              unsigned long hva = gfn_to_hva(kvm, gfn);
>>>              flush_icache_range(hva, hva + PAGE_SIZE);
>>> +             /* Flush d-cache for systems with external caches. */
>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>      } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>              /* any kind of VIPT cache */
>>>              __flush_icache_all();
>> 
>> [adding Will to the discussion as we talked about this in the past]
>> 
>> That's definitely an issue, but I'm not sure the fix is to hit the data
>> cache on each page mapping. It looks overkill.
>> 
>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>> knows which bits of the guest memory have been touched, and can do a "DC
>> DVAC" on this region.
> 
> It seems a bit unnatural to have cache cleaning is user-space. I am sure
> other architectures don't have such cache cleaning in user-space for KVM.

Not sure I understand at which point you really need to make things coherent here. When you assign a new ASID because there could be stale cache entries left from an old ASID run? Can't you just flush all caches when you overrun your ASID allocator?

> 
>> 
>> The alternative is do it in the kernel before running any vcpu - but
>> that's not very nice either (have to clean the whole of the guest
>> memory, which makes a full dcache clean more appealing).
> 
> Actually, cleaning full d-cache by set/way upon first run of VCPU was
> our second option but current approach seemed very simple hence
> we went for this.
> 
> If more people vote for full d-cache clean upon first run of VCPU then
> we should revise this patch.

Is there any difference in performance between the 2 approaches?


Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 14:22   ` Anup Patel
  2013-08-14 15:06     ` Alexander Graf
@ 2013-08-14 15:23     ` Marc Zyngier
  2013-08-14 15:35       ` Peter Maydell
  2013-08-14 15:36       ` Anup Patel
  2013-08-15  4:52     ` Marc Zyngier
  2 siblings, 2 replies; 58+ messages in thread
From: Marc Zyngier @ 2013-08-14 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-14 15:22, Anup Patel wrote:
> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> Hi Pranav,
>>
>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> Systems with large external L3-cache (few MBs), might have dirty
>>> content belonging to the guest page in L3-cache. To tackle this,
>>> we need to flush such dirty content from d-cache so that guest
>>> will see correct contents of guest page when guest MMU is disabled.
>>>
>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>
>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> index efe609c..5129038 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -123,6 +123,8 @@ static inline void
>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>> +             /* Flush d-cache for systems with external caches. */
>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>> VIVT */
>>>               /* any kind of VIPT cache */
>>>               __flush_icache_all();
>>
>> [adding Will to the discussion as we talked about this in the past]
>>
>> That's definitely an issue, but I'm not sure the fix is to hit the 
>> data
>> cache on each page mapping. It looks overkill.
>>
>> Wouldn't it be enough to let userspace do the cache cleaning? 
>> kvmtools
>> knows which bits of the guest memory have been touched, and can do a 
>> "DC
>> DVAC" on this region.
>
> It seems a bit unnatural to have cache cleaning is user-space. I am 
> sure
> other architectures don't have such cache cleaning in user-space for 
> KVM.

Well, we have it on AArch64. Why would we blindly nuke the whole cache 
if we can do the right thing, efficiently, on the right range?

>> The alternative is do it in the kernel before running any vcpu - but
>> that's not very nice either (have to clean the whole of the guest
>> memory, which makes a full dcache clean more appealing).
>
> Actually, cleaning full d-cache by set/way upon first run of VCPU was
> our second option but current approach seemed very simple hence
> we went for this.
>
> If more people vote for full d-cache clean upon first run of VCPU 
> then
> we should revise this patch.

That sounds a lot better than randomly flushing the dcache for no good 
reason. Most of the time, your MMU will be ON, and only the initial 
text/data loaded from userspace is at risk.

But the userspace version sounds definitely better, and I'd like you to 
evaluate it before going the kernel way.

Thanks,

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:06     ` Alexander Graf
@ 2013-08-14 15:34       ` Marc Zyngier
  2013-08-14 15:41         ` Peter Maydell
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-14 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-14 16:06, Alexander Graf wrote:
> On 14.08.2013, at 16:22, Anup Patel wrote:
>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> 
>> wrote:
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is 
>>>> disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external 
>>>> L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>> arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>      if (!icache_is_aliasing()) {            /* PIPT */
>>>>              unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>              flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. 
>>>> */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>      } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>>> VIVT */
>>>>              /* any kind of VIPT cache */
>>>>              __flush_icache_all();
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the 
>>> data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? 
>>> kvmtools
>>> knows which bits of the guest memory have been touched, and can do 
>>> a "DC
>>> DVAC" on this region.
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am 
>> sure
>> other architectures don't have such cache cleaning in user-space for 
>> KVM.
>
> Not sure I understand at which point you really need to make things
> coherent here. When you assign a new ASID because there could be 
> stale
> cache entries left from an old ASID run? Can't you just flush all
> caches when you overrun your ASID allocator?

First, a bit of terminology:
- ASIDs are used for userspace. KVM doesn't deal with those.
- VMIDs are used for VMs. That's what KVM deals with.

The issue here is that when the MMU is disabled, the access goes 
straight to RAM, bypassing the caches altogether (OK not completely 
true, but for the sake of the argument, that's close enough).

When userspace loads the kernel into memory, the kernel is not flushed 
to RAM, and may sit in the L3 cache if the cache is big enough. You 
end-up executing garbage... My proposed fix is to let kvmtool do the 
flushing, as we have userspace cache management operations for this 
exact purpose.

>>
>>>
>>> The alternative is do it in the kernel before running any vcpu - 
>>> but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU 
>> was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU 
>> then
>> we should revise this patch.
>
> Is there any difference in performance between the 2 approaches?

dcache clean on each and every page being mapped in stage-2, versus a 
one-time penalty. Should be significant indeed.

But again, I'd like to see figures. What is the impact of each of the 3 
methods for various memory and cache sizes?

         M.
-- 
Who you jivin' with that Cosmik Debris?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:23     ` Marc Zyngier
@ 2013-08-14 15:35       ` Peter Maydell
  2013-08-14 15:49         ` Marc Zyngier
  2013-08-14 15:36       ` Anup Patel
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Maydell @ 2013-08-14 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-14 15:22, Anup Patel wrote:
>> It seems a bit unnatural to have cache cleaning is user-space. I am
>> sure
>> other architectures don't have such cache cleaning in user-space for
>> KVM.
>
> Well, we have it on AArch64. Why would we blindly nuke the whole cache
> if we can do the right thing, efficiently, on the right range?

When exactly would userspace have to care about the cache?
This patch isn't exactly clear about the circumstances. I
think you'd need a really strong reason for not dealing with
this in the kernel -- in general userspace KVM tools don't
otherwise have to deal with cache maintenance at all.

-- PMM

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:23     ` Marc Zyngier
  2013-08-14 15:35       ` Peter Maydell
@ 2013-08-14 15:36       ` Anup Patel
  1 sibling, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-14 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 8:53 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-14 15:22, Anup Patel wrote:
>>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>>               /* any kind of VIPT cache */
>>>>               __flush_icache_all();
>>>
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>>> knows which bits of the guest memory have been touched, and can do a "DC
>>> DVAC" on this region.
>>
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am sure
>> other architectures don't have such cache cleaning in user-space for KVM.
>
>
> Well, we have it on AArch64. Why would we blindly nuke the whole cache if we
> can do the right thing, efficiently, on the right range?

Flushing from user-space would be better only if target virtual address
range is small. If user-space loads several images at different locations
or large large images in guest RAM then flushing entire d-cache by
set/way would be more efficient.

>
>
>>> The alternative is do it in the kernel before running any vcpu - but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU then
>> we should revise this patch.
>
>
> That sounds a lot better than randomly flushing the dcache for no good
> reason. Most of the time, your MMU will be ON, and only the initial
> text/data loaded from userspace is at risk.
>
> But the userspace version sounds definitely better, and I'd like you to
> evaluate it before going the kernel way.

Considering the penalty, I like the approach of flushing d-cache by
set/way upon first run of VCPU because its just one time penalty.

>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:34       ` Marc Zyngier
@ 2013-08-14 15:41         ` Peter Maydell
  2013-08-14 15:57           ` Marc Zyngier
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Maydell @ 2013-08-14 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
> When userspace loads the kernel into memory, the kernel is not flushed
> to RAM, and may sit in the L3 cache if the cache is big enough. You
> end-up executing garbage... My proposed fix is to let kvmtool do the
> flushing, as we have userspace cache management operations for this
> exact purpose.

Why does this issue only apply to the loaded kernel and not to
the zero bytes in the rest of RAM? I know executing zeroes
isn't a very useful thing to do but it should be a well defined
thing.

-- PMM

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:35       ` Peter Maydell
@ 2013-08-14 15:49         ` Marc Zyngier
  2013-08-14 17:34           ` Christoffer Dall
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-14 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-14 16:35, Peter Maydell wrote:
> On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 2013-08-14 15:22, Anup Patel wrote:
>>> It seems a bit unnatural to have cache cleaning is user-space. I am
>>> sure
>>> other architectures don't have such cache cleaning in user-space 
>>> for
>>> KVM.
>>
>> Well, we have it on AArch64. Why would we blindly nuke the whole 
>> cache
>> if we can do the right thing, efficiently, on the right range?
>
> When exactly would userspace have to care about the cache?

Only for the initial payload, I'd expect. Unless I've missed something 
more crucial?

> This patch isn't exactly clear about the circumstances. I
> think you'd need a really strong reason for not dealing with
> this in the kernel -- in general userspace KVM tools don't
> otherwise have to deal with cache maintenance at all.

I believe we *could* do it in the kernel, just at the expense of a lot 
more CPU cycles.

A possible alternative would be to use HCR.DC, but I need to have a 
look and see what it breaks...

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:41         ` Peter Maydell
@ 2013-08-14 15:57           ` Marc Zyngier
  2013-08-14 16:36             ` Anup Patel
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-14 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-14 16:41, Peter Maydell wrote:
> On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
>> When userspace loads the kernel into memory, the kernel is not 
>> flushed
>> to RAM, and may sit in the L3 cache if the cache is big enough. You
>> end-up executing garbage... My proposed fix is to let kvmtool do the
>> flushing, as we have userspace cache management operations for this
>> exact purpose.
>
> Why does this issue only apply to the loaded kernel and not to
> the zero bytes in the rest of RAM? I know executing zeroes
> isn't a very useful thing to do but it should be a well defined
> thing.

Good point, and not quite sure just yet. Probably we get a zeroed, 
clean page?

Anup, can you elaborate on how your L3 cache behaves?

Thanks,

         M.
-- 
Who you jivin' with that Cosmik Debris?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:57           ` Marc Zyngier
@ 2013-08-14 16:36             ` Anup Patel
  0 siblings, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-14 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 9:27 PM, Marc Zyngier <maz@misterjones.org> wrote:
> On 2013-08-14 16:41, Peter Maydell wrote:
>> On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
>>> When userspace loads the kernel into memory, the kernel is not
>>> flushed
>>> to RAM, and may sit in the L3 cache if the cache is big enough. You
>>> end-up executing garbage... My proposed fix is to let kvmtool do the
>>> flushing, as we have userspace cache management operations for this
>>> exact purpose.
>>
>> Why does this issue only apply to the loaded kernel and not to
>> the zero bytes in the rest of RAM? I know executing zeroes
>> isn't a very useful thing to do but it should be a well defined
>> thing.
>
> Good point, and not quite sure just yet. Probably we get a zeroed,
> clean page?

This would apply to zeroed pages too if some Guest OS expects
zeroed-out portions in Guest RAM.

>
> Anup, can you elaborate on how your L3 cache behaves?

The L3-cache is transparent to Aarch64 CPUs. It is only bypassed
when caching is disabled or when accessing non-cacheable pages.
Currently, we cannot disclose more details about line allocation and
replacement policy because APM X-Gene specs are not released.

The issue pointed out by this patch would also apply to L2-cache
(i.e. L3-cache disabled) but it usually does not manifest because
L2-cache is not too big (few hundred KBs) and so dirty lines don't
stay for long time in L2-cache.

>
> Thanks,
>
>          M.
> --
> Who you jivin' with that Cosmik Debris?
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 15:49         ` Marc Zyngier
@ 2013-08-14 17:34           ` Christoffer Dall
  2013-08-15  4:44             ` Marc Zyngier
  0 siblings, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-14 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
> On 2013-08-14 16:35, Peter Maydell wrote:
> > On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> On 2013-08-14 15:22, Anup Patel wrote:
> >>> It seems a bit unnatural to have cache cleaning is user-space. I am
> >>> sure
> >>> other architectures don't have such cache cleaning in user-space 
> >>> for
> >>> KVM.
> >>
> >> Well, we have it on AArch64. Why would we blindly nuke the whole 
> >> cache
> >> if we can do the right thing, efficiently, on the right range?
> >
> > When exactly would userspace have to care about the cache?
> 
> Only for the initial payload, I'd expect. Unless I've missed something 
> more crucial?

What happens if the page is swapped out, is the kernel guaranteed to
flush it all the way to RAM when it swaps it back in, or would KVM have
to take care of that?

> 
> > This patch isn't exactly clear about the circumstances. I
> > think you'd need a really strong reason for not dealing with
> > this in the kernel -- in general userspace KVM tools don't
> > otherwise have to deal with cache maintenance at all.
> 
> I believe we *could* do it in the kernel, just at the expense of a lot 
> more CPU cycles.
> 
> A possible alternative would be to use HCR.DC, but I need to have a 
> look and see what it breaks...
> 
Could we flush the cache when we fault in pages only if the guest has
the MMU disabled and trap if the guest disabled the MMU and flush the
whole outer dcache at that point in time?

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
  2013-08-14 12:04 ` Marc Zyngier
  2013-08-14 14:23 ` Sudeep KarkadaNagesha
@ 2013-08-14 17:37 ` Christoffer Dall
  2 siblings, 0 replies; 58+ messages in thread
From: Christoffer Dall @ 2013-08-14 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 14, 2013 at 05:17:12PM +0530, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
> 
> The patch fixes coherent_icache_guest_page() for external L3-cache.
> 
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */

This comment is nowhere near explanatory enough for someone who comes by
later and tries to figure out why we're flushing the dcache on every
page we swap in under given circumstances.  Yes, you can do git blame,
until you modify the line for some other reason and it just becomes a
pain.

> +		__flush_dcache_area((void *) hva, PAGE_SIZE);

eh, why is this only relevant for a non-aliasing icache?  In fact, this
does not seem to be icache related at all, but rather instruction-stream
to dcache related, which would warrant either a rename of the function
to something more generic or a separate function.

>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> -- 
> 1.7.9.5
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 17:34           ` Christoffer Dall
@ 2013-08-15  4:44             ` Marc Zyngier
  2013-08-15 16:58               ` Christoffer Dall
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15  4:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-14 18:34, Christoffer Dall wrote:
> On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
>> On 2013-08-14 16:35, Peter Maydell wrote:
>> > On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> 
>> wrote:
>> >> On 2013-08-14 15:22, Anup Patel wrote:
>> >>> It seems a bit unnatural to have cache cleaning is user-space. I 
>> am
>> >>> sure
>> >>> other architectures don't have such cache cleaning in user-space
>> >>> for
>> >>> KVM.
>> >>
>> >> Well, we have it on AArch64. Why would we blindly nuke the whole
>> >> cache
>> >> if we can do the right thing, efficiently, on the right range?
>> >
>> > When exactly would userspace have to care about the cache?
>>
>> Only for the initial payload, I'd expect. Unless I've missed 
>> something
>> more crucial?
>
> What happens if the page is swapped out, is the kernel guaranteed to
> flush it all the way to RAM when it swaps it back in, or would KVM 
> have
> to take care of that?

I'd expect the kernel to deal with it, otherwise you'd hit the wrong 
data each time you swap in a page in any userspace process.

>>
>> > This patch isn't exactly clear about the circumstances. I
>> > think you'd need a really strong reason for not dealing with
>> > this in the kernel -- in general userspace KVM tools don't
>> > otherwise have to deal with cache maintenance at all.
>>
>> I believe we *could* do it in the kernel, just at the expense of a 
>> lot
>> more CPU cycles.
>>
>> A possible alternative would be to use HCR.DC, but I need to have a
>> look and see what it breaks...
>>
> Could we flush the cache when we fault in pages only if the guest has
> the MMU disabled and trap if the guest disabled the MMU and flush the
> whole outer dcache at that point in time?

We don't need to trap the disabling of the MMU. If the guest does that, 
it *must* have flushed its cache to RAM already. Otherwise, it is 
utterly broken, virtualization or not.

What the guest doesn't expect is the initial data to sit in the cache 
while it hasn't set the MMU on just yet. I may have a patch for that.

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-14 14:22   ` Anup Patel
  2013-08-14 15:06     ` Alexander Graf
  2013-08-14 15:23     ` Marc Zyngier
@ 2013-08-15  4:52     ` Marc Zyngier
  2013-08-15  6:26       ` Anup Patel
  2013-08-15  8:39       ` Pranavkumar Sawargaonkar
  2 siblings, 2 replies; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15  4:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

On 2013-08-14 15:22, Anup Patel wrote:
> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> Hi Pranav,
>>
>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> Systems with large external L3-cache (few MBs), might have dirty
>>> content belonging to the guest page in L3-cache. To tackle this,
>>> we need to flush such dirty content from d-cache so that guest
>>> will see correct contents of guest page when guest MMU is disabled.
>>>
>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>
>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> index efe609c..5129038 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -123,6 +123,8 @@ static inline void
>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>> +             /* Flush d-cache for systems with external caches. */
>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>> VIVT */
>>>               /* any kind of VIPT cache */
>>>               __flush_icache_all();
>>
>> [adding Will to the discussion as we talked about this in the past]
>>
>> That's definitely an issue, but I'm not sure the fix is to hit the 
>> data
>> cache on each page mapping. It looks overkill.
>>
>> Wouldn't it be enough to let userspace do the cache cleaning? 
>> kvmtools
>> knows which bits of the guest memory have been touched, and can do a 
>> "DC
>> DVAC" on this region.
>
> It seems a bit unnatural to have cache cleaning is user-space. I am 
> sure
> other architectures don't have such cache cleaning in user-space for 
> KVM.
>
>>
>> The alternative is do it in the kernel before running any vcpu - but
>> that's not very nice either (have to clean the whole of the guest
>> memory, which makes a full dcache clean more appealing).
>
> Actually, cleaning full d-cache by set/way upon first run of VCPU was
> our second option but current approach seemed very simple hence
> we went for this.
>
> If more people vote for full d-cache clean upon first run of VCPU 
> then
> we should revise this patch.

Can you please give the attached patch a spin on your HW? I've 
boot-tested it on a model, but of course I can't really verify that it 
fixes your issue.

As far as I can see, it should fix it without any additional flushing.

Please let me know how it goes.

Thanks,

         M.
-- 
Fast, cheap, reliable. Pick two.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-arm64-KVM-let-the-guest-access-cacheable-memory-when.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130815/52a538b0/attachment-0001.ksh>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  4:52     ` Marc Zyngier
@ 2013-08-15  6:26       ` Anup Patel
  2013-08-15  8:31         ` Marc Zyngier
  2013-08-15  8:39       ` Pranavkumar Sawargaonkar
  1 sibling, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-15  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Anup,
>
>
> On 2013-08-14 15:22, Anup Patel wrote:
>>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>>               /* any kind of VIPT cache */
>>>>               __flush_icache_all();
>>>
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>>> knows which bits of the guest memory have been touched, and can do a "DC
>>> DVAC" on this region.
>>
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am sure
>> other architectures don't have such cache cleaning in user-space for KVM.
>>
>>>
>>> The alternative is do it in the kernel before running any vcpu - but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU then
>> we should revise this patch.
>
>
> Can you please give the attached patch a spin on your HW? I've boot-tested
> it on a model, but of course I can't really verify that it fixes your issue.
>
> As far as I can see, it should fix it without any additional flushing.
>
> Please let me know how it goes.

HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
Outer Write-Back Write-Allocate memory"

HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
treated as "Strongly-ordered device memory"

Now if Guest/VM access hardware MMIO devices directly (such as
VGIC CPU interface) with MMU off then MMIO devices will be
accessed as normal memory when HCR_EL2.DC=1.

The HCR_EL2.DC=1 makes sense only if we have all software
emulated devices for Guest/VM which is not true for KVM ARM or
KVM ARM64 because we use VGIC.

IMHO, this patch enforces incorrect memory attribute for Guest/VM
when Stage1 MMU is off.

Nevertheless, we will still try your patch.

>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  6:26       ` Anup Patel
@ 2013-08-15  8:31         ` Marc Zyngier
  2013-08-15 13:31           ` Anup Patel
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15  8:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-15 07:26, Anup Patel wrote:
> Hi Marc,
>
> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> Hi Anup,
>>
>>
>> On 2013-08-14 15:22, Anup Patel wrote:
>>>
>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier 
>>> <marc.zyngier@arm.com>
>>> wrote:
>>>>
>>>> Hi Pranav,
>>>>
>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>
>>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>>> we need to flush such dirty content from d-cache so that guest
>>>>> will see correct contents of guest page when guest MMU is 
>>>>> disabled.
>>>>>
>>>>> The patch fixes coherent_icache_guest_page() for external 
>>>>> L3-cache.
>>>>>
>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>> ---
>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>> index efe609c..5129038 100644
>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>> +             /* Flush d-cache for systems with external caches. 
>>>>> */
>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>>>> VIVT */
>>>>>               /* any kind of VIPT cache */
>>>>>               __flush_icache_all();
>>>>
>>>>
>>>> [adding Will to the discussion as we talked about this in the 
>>>> past]
>>>>
>>>> That's definitely an issue, but I'm not sure the fix is to hit the 
>>>> data
>>>> cache on each page mapping. It looks overkill.
>>>>
>>>> Wouldn't it be enough to let userspace do the cache cleaning? 
>>>> kvmtools
>>>> knows which bits of the guest memory have been touched, and can do 
>>>> a "DC
>>>> DVAC" on this region.
>>>
>>>
>>> It seems a bit unnatural to have cache cleaning is user-space. I am 
>>> sure
>>> other architectures don't have such cache cleaning in user-space 
>>> for KVM.
>>>
>>>>
>>>> The alternative is do it in the kernel before running any vcpu - 
>>>> but
>>>> that's not very nice either (have to clean the whole of the guest
>>>> memory, which makes a full dcache clean more appealing).
>>>
>>>
>>> Actually, cleaning full d-cache by set/way upon first run of VCPU 
>>> was
>>> our second option but current approach seemed very simple hence
>>> we went for this.
>>>
>>> If more people vote for full d-cache clean upon first run of VCPU 
>>> then
>>> we should revise this patch.
>>
>>
>> Can you please give the attached patch a spin on your HW? I've 
>> boot-tested
>> it on a model, but of course I can't really verify that it fixes 
>> your issue.
>>
>> As far as I can see, it should fix it without any additional 
>> flushing.
>>
>> Please let me know how it goes.
>
> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
> Outer Write-Back Write-Allocate memory"
>
> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> treated as "Strongly-ordered device memory"
>
> Now if Guest/VM access hardware MMIO devices directly (such as
> VGIC CPU interface) with MMU off then MMIO devices will be
> accessed as normal memory when HCR_EL2.DC=1.

I don't think so. Stage-2 still applies, and should force MMIO to be 
accessed as device memory.

> The HCR_EL2.DC=1 makes sense only if we have all software
> emulated devices for Guest/VM which is not true for KVM ARM or
> KVM ARM64 because we use VGIC.
>
> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> when Stage1 MMU is off.

See above. My understanding is that HCR.DC controls the default output 
of Stage-1, and Stage-2 overrides still apply.

> Nevertheless, we will still try your patch.

Thanks.

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  4:52     ` Marc Zyngier
  2013-08-15  6:26       ` Anup Patel
@ 2013-08-15  8:39       ` Pranavkumar Sawargaonkar
  2013-08-15 15:42         ` Marc Zyngier
  1 sibling, 1 reply; 58+ messages in thread
From: Pranavkumar Sawargaonkar @ 2013-08-15  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 15 August 2013 10:22, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Anup,
>
>
> On 2013-08-14 15:22, Anup Patel wrote:
>>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>>               /* any kind of VIPT cache */
>>>>               __flush_icache_all();
>>>
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>>> knows which bits of the guest memory have been touched, and can do a "DC
>>> DVAC" on this region.
>>
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am sure
>> other architectures don't have such cache cleaning in user-space for KVM.
>>
>>>
>>> The alternative is do it in the kernel before running any vcpu - but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU then
>> we should revise this patch.
>
>
> Can you please give the attached patch a spin on your HW? I've boot-tested
> it on a model, but of course I can't really verify that it fixes your issue.
>
> As far as I can see, it should fix it without any additional flushing.
>
> Please let me know how it goes.
>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.
>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

Surely we will try this patch.

Just to add few more points to this discussion :

Actually we are already flushing d-cache in "coherent_icache_guest_page"
by calling flush_icache_range.

Now in my patch i am doing same thing one more time by calling
__flush_dcache_area just below flush_icache_range.

Main difference between d-cache flushing in above two routines is :
flush_icache_range is flushing d-cache by point of unification (dc     cvau) and
 __flush_dcache_area is flushing that by point of coherency (dc      civac).

Now since second routine is doing it by coherency it is making L3C to
be coherent and sync changes with immediate effect and solving the
issue.

Thanks,
Pranav

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  8:31         ` Marc Zyngier
@ 2013-08-15 13:31           ` Anup Patel
  2013-08-15 14:47             ` Marc Zyngier
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-15 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-15 07:26, Anup Patel wrote:
>>
>> Hi Marc,
>>
>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> Hi Anup,
>>>
>>>
>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>
>>>>
>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranav,
>>>>>
>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>
>>>>>>
>>>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>>>> we need to flush such dirty content from d-cache so that guest
>>>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>>>
>>>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>>>
>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>>> ---
>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>> index efe609c..5129038 100644
>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>> +             /* Flush d-cache for systems with external caches. */
>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT
>>>>>> */
>>>>>>               /* any kind of VIPT cache */
>>>>>>               __flush_icache_all();
>>>>>
>>>>>
>>>>>
>>>>> [adding Will to the discussion as we talked about this in the past]
>>>>>
>>>>> That's definitely an issue, but I'm not sure the fix is to hit the data
>>>>> cache on each page mapping. It looks overkill.
>>>>>
>>>>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>>>>> knows which bits of the guest memory have been touched, and can do a
>>>>> "DC
>>>>> DVAC" on this region.
>>>>
>>>>
>>>>
>>>> It seems a bit unnatural to have cache cleaning is user-space. I am sure
>>>> other architectures don't have such cache cleaning in user-space for
>>>> KVM.
>>>>
>>>>>
>>>>> The alternative is do it in the kernel before running any vcpu - but
>>>>> that's not very nice either (have to clean the whole of the guest
>>>>> memory, which makes a full dcache clean more appealing).
>>>>
>>>>
>>>>
>>>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>>>> our second option but current approach seemed very simple hence
>>>> we went for this.
>>>>
>>>> If more people vote for full d-cache clean upon first run of VCPU then
>>>> we should revise this patch.
>>>
>>>
>>>
>>> Can you please give the attached patch a spin on your HW? I've
>>> boot-tested
>>> it on a model, but of course I can't really verify that it fixes your
>>> issue.
>>>
>>> As far as I can see, it should fix it without any additional flushing.
>>>
>>> Please let me know how it goes.
>>
>>
>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
>> Outer Write-Back Write-Allocate memory"
>>
>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> treated as "Strongly-ordered device memory"
>>
>> Now if Guest/VM access hardware MMIO devices directly (such as
>> VGIC CPU interface) with MMU off then MMIO devices will be
>> accessed as normal memory when HCR_EL2.DC=1.
>
>
> I don't think so. Stage-2 still applies, and should force MMIO to be
> accessed as device memory.
>
>
>> The HCR_EL2.DC=1 makes sense only if we have all software
>> emulated devices for Guest/VM which is not true for KVM ARM or
>> KVM ARM64 because we use VGIC.
>>
>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> when Stage1 MMU is off.
>
>
> See above. My understanding is that HCR.DC controls the default output of
> Stage-1, and Stage-2 overrides still apply.

You had mentioned that PAGE_S2_DEVICE attribute was redundant
and wanted guest to decide the memory attribute. In other words, you
did not want to enforce any memory attribute in Stage2.

Please refer to this patch https://patchwork.kernel.org/patch/2543201/

>
>
>> Nevertheless, we will still try your patch.
>
>
> Thanks.
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 13:31           ` Anup Patel
@ 2013-08-15 14:47             ` Marc Zyngier
  2013-08-15 15:13               ` Anup Patel
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-15 14:31, Anup Patel wrote:
> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> On 2013-08-15 07:26, Anup Patel wrote:
>>>
>>> Hi Marc,
>>>
>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier 
>>> <marc.zyngier@arm.com>
>>> wrote:
>>>>
>>>> Hi Anup,
>>>>
>>>>
>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>
>>>>>
>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier 
>>>>> <marc.zyngier@arm.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranav,
>>>>>>
>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>
>>>>>>>
>>>>>>> Systems with large external L3-cache (few MBs), might have 
>>>>>>> dirty
>>>>>>> content belonging to the guest page in L3-cache. To tackle 
>>>>>>> this,
>>>>>>> we need to flush such dirty content from d-cache so that guest
>>>>>>> will see correct contents of guest page when guest MMU is 
>>>>>>> disabled.
>>>>>>>
>>>>>>> The patch fixes coherent_icache_guest_page() for external 
>>>>>>> L3-cache.
>>>>>>>
>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar 
>>>>>>> <pranavkumar@linaro.org>
>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>>>> ---
>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>> index efe609c..5129038 100644
>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>> +             /* Flush d-cache for systems with external 
>>>>>>> caches. */
>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>       } else if (!icache_is_aivivt()) {       /* non 
>>>>>>> ASID-tagged VIVT
>>>>>>> */
>>>>>>>               /* any kind of VIPT cache */
>>>>>>>               __flush_icache_all();
>>>>>>
>>>>>>
>>>>>>
>>>>>> [adding Will to the discussion as we talked about this in the 
>>>>>> past]
>>>>>>
>>>>>> That's definitely an issue, but I'm not sure the fix is to hit 
>>>>>> the data
>>>>>> cache on each page mapping. It looks overkill.
>>>>>>
>>>>>> Wouldn't it be enough to let userspace do the cache cleaning? 
>>>>>> kvmtools
>>>>>> knows which bits of the guest memory have been touched, and can 
>>>>>> do a
>>>>>> "DC
>>>>>> DVAC" on this region.
>>>>>
>>>>>
>>>>>
>>>>> It seems a bit unnatural to have cache cleaning is user-space. I 
>>>>> am sure
>>>>> other architectures don't have such cache cleaning in user-space 
>>>>> for
>>>>> KVM.
>>>>>
>>>>>>
>>>>>> The alternative is do it in the kernel before running any vcpu - 
>>>>>> but
>>>>>> that's not very nice either (have to clean the whole of the 
>>>>>> guest
>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>
>>>>>
>>>>>
>>>>> Actually, cleaning full d-cache by set/way upon first run of VCPU 
>>>>> was
>>>>> our second option but current approach seemed very simple hence
>>>>> we went for this.
>>>>>
>>>>> If more people vote for full d-cache clean upon first run of VCPU 
>>>>> then
>>>>> we should revise this patch.
>>>>
>>>>
>>>>
>>>> Can you please give the attached patch a spin on your HW? I've
>>>> boot-tested
>>>> it on a model, but of course I can't really verify that it fixes 
>>>> your
>>>> issue.
>>>>
>>>> As far as I can see, it should fix it without any additional 
>>>> flushing.
>>>>
>>>> Please let me know how it goes.
>>>
>>>
>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
>>> Outer Write-Back Write-Allocate memory"
>>>
>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>> treated as "Strongly-ordered device memory"
>>>
>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>> accessed as normal memory when HCR_EL2.DC=1.
>>
>>
>> I don't think so. Stage-2 still applies, and should force MMIO to be
>> accessed as device memory.
>>
>>
>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>> KVM ARM64 because we use VGIC.
>>>
>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>> when Stage1 MMU is off.
>>
>>
>> See above. My understanding is that HCR.DC controls the default 
>> output of
>> Stage-1, and Stage-2 overrides still apply.
>
> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> and wanted guest to decide the memory attribute. In other words, you
> did not want to enforce any memory attribute in Stage2.
>
> Please refer to this patch 
> https://patchwork.kernel.org/patch/2543201/

This patch has never been merged. If you carry on following the 
discussion, you will certainly notice it was dropped for a very good 
reason:
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html

So Stage-2 memory attributes are used, they are not going away, and 
they are essential to the patch I sent this morning.

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 14:47             ` Marc Zyngier
@ 2013-08-15 15:13               ` Anup Patel
  2013-08-15 15:37                 ` Marc Zyngier
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-15 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-15 14:31, Anup Patel wrote:
>>
>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> On 2013-08-15 07:26, Anup Patel wrote:
>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier@arm.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Anup,
>>>>>
>>>>>
>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranav,
>>>>>>>
>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>>>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>>>>>> we need to flush such dirty content from d-cache so that guest
>>>>>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>>>>>
>>>>>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>>>>>
>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>>>>> ---
>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>> index efe609c..5129038 100644
>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>>> +             /* Flush d-cache for systems with external caches. */
>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged
>>>>>>>> VIVT
>>>>>>>> */
>>>>>>>>               /* any kind of VIPT cache */
>>>>>>>>               __flush_icache_all();
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [adding Will to the discussion as we talked about this in the past]
>>>>>>>
>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit the
>>>>>>> data
>>>>>>> cache on each page mapping. It looks overkill.
>>>>>>>
>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>>>>>> kvmtools
>>>>>>> knows which bits of the guest memory have been touched, and can do a
>>>>>>> "DC
>>>>>>> DVAC" on this region.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It seems a bit unnatural to have cache cleaning is user-space. I am
>>>>>> sure
>>>>>> other architectures don't have such cache cleaning in user-space for
>>>>>> KVM.
>>>>>>
>>>>>>>
>>>>>>> The alternative is do it in the kernel before running any vcpu - but
>>>>>>> that's not very nice either (have to clean the whole of the guest
>>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>>>>>> our second option but current approach seemed very simple hence
>>>>>> we went for this.
>>>>>>
>>>>>> If more people vote for full d-cache clean upon first run of VCPU then
>>>>>> we should revise this patch.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Can you please give the attached patch a spin on your HW? I've
>>>>> boot-tested
>>>>> it on a model, but of course I can't really verify that it fixes your
>>>>> issue.
>>>>>
>>>>> As far as I can see, it should fix it without any additional flushing.
>>>>>
>>>>> Please let me know how it goes.
>>>>
>>>>
>>>>
>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>>> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
>>>> Outer Write-Back Write-Allocate memory"
>>>>
>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>>> treated as "Strongly-ordered device memory"
>>>>
>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>>> accessed as normal memory when HCR_EL2.DC=1.
>>>
>>>
>>>
>>> I don't think so. Stage-2 still applies, and should force MMIO to be
>>> accessed as device memory.
>>>
>>>
>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>>> KVM ARM64 because we use VGIC.
>>>>
>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>>> when Stage1 MMU is off.
>>>
>>>
>>>
>>> See above. My understanding is that HCR.DC controls the default output of
>>> Stage-1, and Stage-2 overrides still apply.
>>
>>
>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> and wanted guest to decide the memory attribute. In other words, you
>> did not want to enforce any memory attribute in Stage2.
>>
>> Please refer to this patch https://patchwork.kernel.org/patch/2543201/
>
>
> This patch has never been merged. If you carry on following the discussion,
> you will certainly notice it was dropped for a very good reason:
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>
> So Stage-2 memory attributes are used, they are not going away, and they are
> essential to the patch I sent this morning.
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM is
provided a DMA-capable device in pass-through mode. The reason being
bootloader/firmware typically don't enable MMU and such bootloader/firmware
will programme a pass-through DMA-capable device without any flushes to
guest RAM (because it has not enabled MMU).

A good example of such a device would be SATA AHCI controller given to a
Guest/VM as direct access (using SystemMMU) and Guest bootloader/firmware
accessing this SATA AHCI controller to load kernel images from SATA disk.
In this situation, we will have to hack Guest bootloader/firmware AHCI driver to
explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 15:13               ` Anup Patel
@ 2013-08-15 15:37                 ` Marc Zyngier
  2013-08-15 15:45                   ` Anup Patel
  2013-08-15 16:53                   ` Christoffer Dall
  0 siblings, 2 replies; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-15 16:13, Anup Patel wrote:
> Hi Marc,
>
> On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> On 2013-08-15 14:31, Anup Patel wrote:
>>>
>>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier 
>>> <marc.zyngier@arm.com>
>>> wrote:
>>>>
>>>> On 2013-08-15 07:26, Anup Patel wrote:
>>>>>
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier 
>>>>> <marc.zyngier@arm.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Anup,
>>>>>>
>>>>>>
>>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier 
>>>>>>> <marc.zyngier@arm.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranav,
>>>>>>>>
>>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Systems with large external L3-cache (few MBs), might have 
>>>>>>>>> dirty
>>>>>>>>> content belonging to the guest page in L3-cache. To tackle 
>>>>>>>>> this,
>>>>>>>>> we need to flush such dirty content from d-cache so that 
>>>>>>>>> guest
>>>>>>>>> will see correct contents of guest page when guest MMU is 
>>>>>>>>> disabled.
>>>>>>>>>
>>>>>>>>> The patch fixes coherent_icache_guest_page() for external 
>>>>>>>>> L3-cache.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar 
>>>>>>>>> <pranavkumar@linaro.org>
>>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>>>>>> ---
>>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> index efe609c..5129038 100644
>>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>>>> +             /* Flush d-cache for systems with external 
>>>>>>>>> caches. */
>>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non 
>>>>>>>>> ASID-tagged
>>>>>>>>> VIVT
>>>>>>>>> */
>>>>>>>>>               /* any kind of VIPT cache */
>>>>>>>>>               __flush_icache_all();
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [adding Will to the discussion as we talked about this in the 
>>>>>>>> past]
>>>>>>>>
>>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit 
>>>>>>>> the
>>>>>>>> data
>>>>>>>> cache on each page mapping. It looks overkill.
>>>>>>>>
>>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>>>>>>> kvmtools
>>>>>>>> knows which bits of the guest memory have been touched, and 
>>>>>>>> can do a
>>>>>>>> "DC
>>>>>>>> DVAC" on this region.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It seems a bit unnatural to have cache cleaning is user-space. 
>>>>>>> I am
>>>>>>> sure
>>>>>>> other architectures don't have such cache cleaning in 
>>>>>>> user-space for
>>>>>>> KVM.
>>>>>>>
>>>>>>>>
>>>>>>>> The alternative is do it in the kernel before running any vcpu 
>>>>>>>> - but
>>>>>>>> that's not very nice either (have to clean the whole of the 
>>>>>>>> guest
>>>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Actually, cleaning full d-cache by set/way upon first run of 
>>>>>>> VCPU was
>>>>>>> our second option but current approach seemed very simple hence
>>>>>>> we went for this.
>>>>>>>
>>>>>>> If more people vote for full d-cache clean upon first run of 
>>>>>>> VCPU then
>>>>>>> we should revise this patch.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you please give the attached patch a spin on your HW? I've
>>>>>> boot-tested
>>>>>> it on a model, but of course I can't really verify that it fixes 
>>>>>> your
>>>>>> issue.
>>>>>>
>>>>>> As far as I can see, it should fix it without any additional 
>>>>>> flushing.
>>>>>>
>>>>>> Please let me know how it goes.
>>>>>
>>>>>
>>>>>
>>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>>>> treated as "Normal Non-shareable, Inner Write-Back 
>>>>> Write-Allocate,
>>>>> Outer Write-Back Write-Allocate memory"
>>>>>
>>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>>>> treated as "Strongly-ordered device memory"
>>>>>
>>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>>>> accessed as normal memory when HCR_EL2.DC=1.
>>>>
>>>>
>>>>
>>>> I don't think so. Stage-2 still applies, and should force MMIO to 
>>>> be
>>>> accessed as device memory.
>>>>
>>>>
>>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>>>> KVM ARM64 because we use VGIC.
>>>>>
>>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>>>> when Stage1 MMU is off.
>>>>
>>>>
>>>>
>>>> See above. My understanding is that HCR.DC controls the default 
>>>> output of
>>>> Stage-1, and Stage-2 overrides still apply.
>>>
>>>
>>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>> and wanted guest to decide the memory attribute. In other words, 
>>> you
>>> did not want to enforce any memory attribute in Stage2.
>>>
>>> Please refer to this patch 
>>> https://patchwork.kernel.org/patch/2543201/
>>
>>
>> This patch has never been merged. If you carry on following the 
>> discussion,
>> you will certainly notice it was dropped for a very good reason:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>
>> So Stage-2 memory attributes are used, they are not going away, and 
>> they are
>> essential to the patch I sent this morning.
>>
>>
>>         M.
>> --
>> Fast, cheap, reliable. Pick two.
>
> HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM 
> is
> provided a DMA-capable device in pass-through mode. The reason being
> bootloader/firmware typically don't enable MMU and such 
> bootloader/firmware
> will programme a pass-through DMA-capable device without any flushes 
> to
> guest RAM (because it has not enabled MMU).
>
> A good example of such a device would be SATA AHCI controller given 
> to a
> Guest/VM as direct access (using SystemMMU) and Guest 
> bootloader/firmware
> accessing this SATA AHCI controller to load kernel images from SATA 
> disk.
> In this situation, we will have to hack Guest bootloader/firmware
> AHCI driver to
> explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).

OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of 
curiosity: is that a made up example or something you actually have?

Back to square one:
Can you please benchmark the various cache cleaning options (global at 
startup time, per-page on S2 translation fault, and user-space)?

Thanks,

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  8:39       ` Pranavkumar Sawargaonkar
@ 2013-08-15 15:42         ` Marc Zyngier
  0 siblings, 0 replies; 58+ messages in thread
From: Marc Zyngier @ 2013-08-15 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pranav,

On 2013-08-15 09:39, Pranavkumar Sawargaonkar wrote:

[...]

> Just to add few more points to this discussion :
>
> Actually we are already flushing d-cache in 
> "coherent_icache_guest_page"
> by calling flush_icache_range.
>
> Now in my patch i am doing same thing one more time by calling
> __flush_dcache_area just below flush_icache_range.
>
> Main difference between d-cache flushing in above two routines is :
> flush_icache_range is flushing d-cache by point of unification (dc
> cvau) and
>  __flush_dcache_area is flushing that by point of coherency (dc      
> civac).
>
> Now since second routine is doing it by coherency it is making L3C to
> be coherent and sync changes with immediate effect and solving the
> issue.

I understand that. What I object to is that always cleaning to PoC is 
expensive (both in terms of latency and energy), and doing so for each 
page, long after the MMU has been enabled feels like a great loss of 
performance.

My gut feeling is that a single clean operation at startup time 
(whether it is from the kernel or userspace) would be a lot better.

Thanks,

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 15:37                 ` Marc Zyngier
@ 2013-08-15 15:45                   ` Anup Patel
  2013-08-15 16:53                   ` Christoffer Dall
  1 sibling, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-15 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 15, 2013 at 9:07 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-15 16:13, Anup Patel wrote:
>>
>> Hi Marc,
>>
>> On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> On 2013-08-15 14:31, Anup Patel wrote:
>>>>
>>>>
>>>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier <marc.zyngier@arm.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 2013-08-15 07:26, Anup Patel wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Anup,
>>>>>>>
>>>>>>>
>>>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranav,
>>>>>>>>>
>>>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>>>>>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>>>>>>>> we need to flush such dirty content from d-cache so that guest
>>>>>>>>>> will see correct contents of guest page when guest MMU is
>>>>>>>>>> disabled.
>>>>>>>>>>
>>>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>>>>>>>>>> L3-cache.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>>>>>>>> ---
>>>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> index efe609c..5129038 100644
>>>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>>>>> +             /* Flush d-cache for systems with external caches.
>>>>>>>>>> */
>>>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged
>>>>>>>>>> VIVT
>>>>>>>>>> */
>>>>>>>>>>               /* any kind of VIPT cache */
>>>>>>>>>>               __flush_icache_all();
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [adding Will to the discussion as we talked about this in the past]
>>>>>>>>>
>>>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit the
>>>>>>>>> data
>>>>>>>>> cache on each page mapping. It looks overkill.
>>>>>>>>>
>>>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>>>>>>>> kvmtools
>>>>>>>>> knows which bits of the guest memory have been touched, and can do
>>>>>>>>> a
>>>>>>>>> "DC
>>>>>>>>> DVAC" on this region.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It seems a bit unnatural to have cache cleaning is user-space. I am
>>>>>>>> sure
>>>>>>>> other architectures don't have such cache cleaning in user-space for
>>>>>>>> KVM.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The alternative is do it in the kernel before running any vcpu -
>>>>>>>>> but
>>>>>>>>> that's not very nice either (have to clean the whole of the guest
>>>>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, cleaning full d-cache by set/way upon first run of VCPU
>>>>>>>> was
>>>>>>>> our second option but current approach seemed very simple hence
>>>>>>>> we went for this.
>>>>>>>>
>>>>>>>> If more people vote for full d-cache clean upon first run of VCPU
>>>>>>>> then
>>>>>>>> we should revise this patch.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you please give the attached patch a spin on your HW? I've
>>>>>>> boot-tested
>>>>>>> it on a model, but of course I can't really verify that it fixes your
>>>>>>> issue.
>>>>>>>
>>>>>>> As far as I can see, it should fix it without any additional
>>>>>>> flushing.
>>>>>>>
>>>>>>> Please let me know how it goes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>>>>> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
>>>>>> Outer Write-Back Write-Allocate memory"
>>>>>>
>>>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>>>>> treated as "Strongly-ordered device memory"
>>>>>>
>>>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>>>>> accessed as normal memory when HCR_EL2.DC=1.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't think so. Stage-2 still applies, and should force MMIO to be
>>>>> accessed as device memory.
>>>>>
>>>>>
>>>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>>>>> KVM ARM64 because we use VGIC.
>>>>>>
>>>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>>>>> when Stage1 MMU is off.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> See above. My understanding is that HCR.DC controls the default output
>>>>> of
>>>>> Stage-1, and Stage-2 overrides still apply.
>>>>
>>>>
>>>>
>>>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>>> and wanted guest to decide the memory attribute. In other words, you
>>>> did not want to enforce any memory attribute in Stage2.
>>>>
>>>> Please refer to this patch https://patchwork.kernel.org/patch/2543201/
>>>
>>>
>>>
>>> This patch has never been merged. If you carry on following the
>>> discussion,
>>> you will certainly notice it was dropped for a very good reason:
>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>>
>>> So Stage-2 memory attributes are used, they are not going away, and they
>>> are
>>> essential to the patch I sent this morning.
>>>
>>>
>>>         M.
>>> --
>>> Fast, cheap, reliable. Pick two.
>>
>>
>> HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM is
>> provided a DMA-capable device in pass-through mode. The reason being
>> bootloader/firmware typically don't enable MMU and such
>> bootloader/firmware
>> will programme a pass-through DMA-capable device without any flushes to
>> guest RAM (because it has not enabled MMU).
>>
>> A good example of such a device would be SATA AHCI controller given to a
>> Guest/VM as direct access (using SystemMMU) and Guest bootloader/firmware
>> accessing this SATA AHCI controller to load kernel images from SATA disk.
>> In this situation, we will have to hack Guest bootloader/firmware
>> AHCI driver to
>> explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>
>
> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> curiosity: is that a made up example or something you actually have?

Actually, this would be a typical use-case in embedded virtualization.
Other devices that fit this use-case would be network controllers,
security engines, or some proprietary HW not having drivers in Linux.

>
> Back to square one:
> Can you please benchmark the various cache cleaning options (global at
> startup time, per-page on S2 translation fault, and user-space)?
>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 15:37                 ` Marc Zyngier
  2013-08-15 15:45                   ` Anup Patel
@ 2013-08-15 16:53                   ` Christoffer Dall
  2013-08-16  5:02                     ` Anup Patel
  1 sibling, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-15 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> On 2013-08-15 16:13, Anup Patel wrote:
> > Hi Marc,
> >
> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com> 
> > wrote:
> >> On 2013-08-15 14:31, Anup Patel wrote:
> >>>
> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier 
> >>> <marc.zyngier@arm.com>
> >>> wrote:
> >>>>
> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >>>>>
> >>>>>
> >>>>> Hi Marc,
> >>>>>
> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier 
> >>>>> <marc.zyngier@arm.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi Anup,
> >>>>>>
> >>>>>>
> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier 
> >>>>>>> <marc.zyngier@arm.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Pranav,
> >>>>>>>>
> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Systems with large external L3-cache (few MBs), might have 
> >>>>>>>>> dirty
> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle 
> >>>>>>>>> this,
> >>>>>>>>> we need to flush such dirty content from d-cache so that 
> >>>>>>>>> guest
> >>>>>>>>> will see correct contents of guest page when guest MMU is 
> >>>>>>>>> disabled.
> >>>>>>>>>
> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external 
> >>>>>>>>> L3-cache.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar 
> >>>>>>>>> <pranavkumar@linaro.org>
> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >>>>>>>>> ---
> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >>>>>>>>>  1 file changed, 2 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >>>>>>>>> index efe609c..5129038 100644
> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >>>>>>>>> +             /* Flush d-cache for systems with external 
> >>>>>>>>> caches. */
> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non 
> >>>>>>>>> ASID-tagged
> >>>>>>>>> VIVT
> >>>>>>>>> */
> >>>>>>>>>               /* any kind of VIPT cache */
> >>>>>>>>>               __flush_icache_all();
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [adding Will to the discussion as we talked about this in the 
> >>>>>>>> past]
> >>>>>>>>
> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit 
> >>>>>>>> the
> >>>>>>>> data
> >>>>>>>> cache on each page mapping. It looks overkill.
> >>>>>>>>
> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >>>>>>>> kvmtools
> >>>>>>>> knows which bits of the guest memory have been touched, and 
> >>>>>>>> can do a
> >>>>>>>> "DC
> >>>>>>>> DVAC" on this region.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space. 
> >>>>>>> I am
> >>>>>>> sure
> >>>>>>> other architectures don't have such cache cleaning in 
> >>>>>>> user-space for
> >>>>>>> KVM.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> The alternative is do it in the kernel before running any vcpu 
> >>>>>>>> - but
> >>>>>>>> that's not very nice either (have to clean the whole of the 
> >>>>>>>> guest
> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of 
> >>>>>>> VCPU was
> >>>>>>> our second option but current approach seemed very simple hence
> >>>>>>> we went for this.
> >>>>>>>
> >>>>>>> If more people vote for full d-cache clean upon first run of 
> >>>>>>> VCPU then
> >>>>>>> we should revise this patch.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >>>>>> boot-tested
> >>>>>> it on a model, but of course I can't really verify that it fixes 
> >>>>>> your
> >>>>>> issue.
> >>>>>>
> >>>>>> As far as I can see, it should fix it without any additional 
> >>>>>> flushing.
> >>>>>>
> >>>>>> Please let me know how it goes.
> >>>>>
> >>>>>
> >>>>>
> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >>>>> treated as "Normal Non-shareable, Inner Write-Back 
> >>>>> Write-Allocate,
> >>>>> Outer Write-Back Write-Allocate memory"
> >>>>>
> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >>>>> treated as "Strongly-ordered device memory"
> >>>>>
> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >>>>
> >>>>
> >>>>
> >>>> I don't think so. Stage-2 still applies, and should force MMIO to 
> >>>> be
> >>>> accessed as device memory.
> >>>>
> >>>>
> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >>>>> KVM ARM64 because we use VGIC.
> >>>>>
> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >>>>> when Stage1 MMU is off.
> >>>>
> >>>>
> >>>>
> >>>> See above. My understanding is that HCR.DC controls the default 
> >>>> output of
> >>>> Stage-1, and Stage-2 overrides still apply.
> >>>
> >>>
> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >>> and wanted guest to decide the memory attribute. In other words, 
> >>> you
> >>> did not want to enforce any memory attribute in Stage2.
> >>>
> >>> Please refer to this patch 
> >>> https://patchwork.kernel.org/patch/2543201/
> >>
> >>
> >> This patch has never been merged. If you carry on following the 
> >> discussion,
> >> you will certainly notice it was dropped for a very good reason:
> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >>
> >> So Stage-2 memory attributes are used, they are not going away, and 
> >> they are
> >> essential to the patch I sent this morning.
> >>
> >>
> >>         M.
> >> --
> >> Fast, cheap, reliable. Pick two.
> >
> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM 
> > is
> > provided a DMA-capable device in pass-through mode. The reason being
> > bootloader/firmware typically don't enable MMU and such 
> > bootloader/firmware
> > will programme a pass-through DMA-capable device without any flushes 
> > to
> > guest RAM (because it has not enabled MMU).
> >
> > A good example of such a device would be SATA AHCI controller given 
> > to a
> > Guest/VM as direct access (using SystemMMU) and Guest 
> > bootloader/firmware
> > accessing this SATA AHCI controller to load kernel images from SATA 
> > disk.
> > In this situation, we will have to hack Guest bootloader/firmware
> > AHCI driver to
> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> 
> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of 
> curiosity: is that a made up example or something you actually have?
> 
> Back to square one:
> Can you please benchmark the various cache cleaning options (global at 
> startup time, per-page on S2 translation fault, and user-space)?
> 
Eh, why is this a more valid argument than the vgic?  The device
passthrough Stage-2 mappings would still have the Stage-2 memory
attributes to configure that memory region as device memory.  Why is it
relevant if the device is DMA-capable in this context?

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15  4:44             ` Marc Zyngier
@ 2013-08-15 16:58               ` Christoffer Dall
  0 siblings, 0 replies; 58+ messages in thread
From: Christoffer Dall @ 2013-08-15 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 15, 2013 at 05:44:52AM +0100, Marc Zyngier wrote:
> On 2013-08-14 18:34, Christoffer Dall wrote:
> >On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
> >>On 2013-08-14 16:35, Peter Maydell wrote:
> >>> On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com>
> >>wrote:
> >>>> On 2013-08-14 15:22, Anup Patel wrote:
> >>>>> It seems a bit unnatural to have cache cleaning is
> >>user-space. I am
> >>>>> sure
> >>>>> other architectures don't have such cache cleaning in user-space
> >>>>> for
> >>>>> KVM.
> >>>>
> >>>> Well, we have it on AArch64. Why would we blindly nuke the whole
> >>>> cache
> >>>> if we can do the right thing, efficiently, on the right range?
> >>>
> >>> When exactly would userspace have to care about the cache?
> >>
> >>Only for the initial payload, I'd expect. Unless I've missed
> >>something
> >>more crucial?
> >
> >What happens if the page is swapped out, is the kernel guaranteed to
> >flush it all the way to RAM when it swaps it back in, or would KVM
> >have
> >to take care of that?
> 
> I'd expect the kernel to deal with it, otherwise you'd hit the wrong
> data each time you swap in a page in any userspace process.
> 

Unless the kernel only does that for pages that are mapped executable,
which it cannot know for VMs.

Admittedly I haven't looked at the code recently.  Do you know?

> >>
> >>> This patch isn't exactly clear about the circumstances. I
> >>> think you'd need a really strong reason for not dealing with
> >>> this in the kernel -- in general userspace KVM tools don't
> >>> otherwise have to deal with cache maintenance at all.
> >>
> >>I believe we *could* do it in the kernel, just at the expense of
> >>a lot
> >>more CPU cycles.
> >>
> >>A possible alternative would be to use HCR.DC, but I need to have a
> >>look and see what it breaks...
> >>
> >Could we flush the cache when we fault in pages only if the guest has
> >the MMU disabled and trap if the guest disabled the MMU and flush the
> >whole outer dcache at that point in time?
> 
> We don't need to trap the disabling of the MMU. If the guest does
> that, it *must* have flushed its cache to RAM already. Otherwise, it
> is utterly broken, virtualization or not.
> 
> What the guest doesn't expect is the initial data to sit in the
> cache while it hasn't set the MMU on just yet. I may have a patch
> for that.
> 

Otherwise we can at least do this only when faulting in pages and the
MMU is off then, correct?

Is it completely inconceivable that user space starts the guest and when
some device is poked, it loads something else into memory that is going
to be executed after the VM has been started, but before turning on the
MMU?  If not, then doing a single flush before starting the VM won't cut
it.

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-15 16:53                   ` Christoffer Dall
@ 2013-08-16  5:02                     ` Anup Patel
  2013-08-16  6:57                       ` Anup Patel
  2013-08-16 17:14                       ` Christoffer Dall
  0 siblings, 2 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-16  5:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>> On 2013-08-15 16:13, Anup Patel wrote:
>> > Hi Marc,
>> >
>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>> > wrote:
>> >> On 2013-08-15 14:31, Anup Patel wrote:
>> >>>
>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>> >>> <marc.zyngier@arm.com>
>> >>> wrote:
>> >>>>
>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hi Marc,
>> >>>>>
>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>> >>>>> <marc.zyngier@arm.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Hi Anup,
>> >>>>>>
>> >>>>>>
>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>> >>>>>>> <marc.zyngier@arm.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Hi Pranav,
>> >>>>>>>>
>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>> >>>>>>>>> dirty
>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>> >>>>>>>>> this,
>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>> >>>>>>>>> guest
>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>> >>>>>>>>> disabled.
>> >>>>>>>>>
>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>> >>>>>>>>> L3-cache.
>> >>>>>>>>>
>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>> >>>>>>>>> <pranavkumar@linaro.org>
>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> >>>>>>>>> ---
>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>> >>>>>>>>>  1 file changed, 2 insertions(+)
>> >>>>>>>>>
>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> index efe609c..5129038 100644
>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> >>>>>>>>> +             /* Flush d-cache for systems with external
>> >>>>>>>>> caches. */
>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>> >>>>>>>>> ASID-tagged
>> >>>>>>>>> VIVT
>> >>>>>>>>> */
>> >>>>>>>>>               /* any kind of VIPT cache */
>> >>>>>>>>>               __flush_icache_all();
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>> >>>>>>>> past]
>> >>>>>>>>
>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>> >>>>>>>> the
>> >>>>>>>> data
>> >>>>>>>> cache on each page mapping. It looks overkill.
>> >>>>>>>>
>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>> >>>>>>>> kvmtools
>> >>>>>>>> knows which bits of the guest memory have been touched, and
>> >>>>>>>> can do a
>> >>>>>>>> "DC
>> >>>>>>>> DVAC" on this region.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>> >>>>>>> I am
>> >>>>>>> sure
>> >>>>>>> other architectures don't have such cache cleaning in
>> >>>>>>> user-space for
>> >>>>>>> KVM.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>> >>>>>>>> - but
>> >>>>>>>> that's not very nice either (have to clean the whole of the
>> >>>>>>>> guest
>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>> >>>>>>> VCPU was
>> >>>>>>> our second option but current approach seemed very simple hence
>> >>>>>>> we went for this.
>> >>>>>>>
>> >>>>>>> If more people vote for full d-cache clean upon first run of
>> >>>>>>> VCPU then
>> >>>>>>> we should revise this patch.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>> >>>>>> boot-tested
>> >>>>>> it on a model, but of course I can't really verify that it fixes
>> >>>>>> your
>> >>>>>> issue.
>> >>>>>>
>> >>>>>> As far as I can see, it should fix it without any additional
>> >>>>>> flushing.
>> >>>>>>
>> >>>>>> Please let me know how it goes.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>> >>>>> Write-Allocate,
>> >>>>> Outer Write-Back Write-Allocate memory"
>> >>>>>
>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> >>>>> treated as "Strongly-ordered device memory"
>> >>>>>
>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>> >>>>
>> >>>>
>> >>>>
>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>> >>>> be
>> >>>> accessed as device memory.
>> >>>>
>> >>>>
>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>> >>>>> KVM ARM64 because we use VGIC.
>> >>>>>
>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> >>>>> when Stage1 MMU is off.
>> >>>>
>> >>>>
>> >>>>
>> >>>> See above. My understanding is that HCR.DC controls the default
>> >>>> output of
>> >>>> Stage-1, and Stage-2 overrides still apply.
>> >>>
>> >>>
>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> >>> and wanted guest to decide the memory attribute. In other words,
>> >>> you
>> >>> did not want to enforce any memory attribute in Stage2.
>> >>>
>> >>> Please refer to this patch
>> >>> https://patchwork.kernel.org/patch/2543201/
>> >>
>> >>
>> >> This patch has never been merged. If you carry on following the
>> >> discussion,
>> >> you will certainly notice it was dropped for a very good reason:
>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>> >>
>> >> So Stage-2 memory attributes are used, they are not going away, and
>> >> they are
>> >> essential to the patch I sent this morning.
>> >>
>> >>
>> >>         M.
>> >> --
>> >> Fast, cheap, reliable. Pick two.
>> >
>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>> > is
>> > provided a DMA-capable device in pass-through mode. The reason being
>> > bootloader/firmware typically don't enable MMU and such
>> > bootloader/firmware
>> > will programme a pass-through DMA-capable device without any flushes
>> > to
>> > guest RAM (because it has not enabled MMU).
>> >
>> > A good example of such a device would be SATA AHCI controller given
>> > to a
>> > Guest/VM as direct access (using SystemMMU) and Guest
>> > bootloader/firmware
>> > accessing this SATA AHCI controller to load kernel images from SATA
>> > disk.
>> > In this situation, we will have to hack Guest bootloader/firmware
>> > AHCI driver to
>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>>
>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>> curiosity: is that a made up example or something you actually have?
>>
>> Back to square one:
>> Can you please benchmark the various cache cleaning options (global at
>> startup time, per-page on S2 translation fault, and user-space)?
>>
> Eh, why is this a more valid argument than the vgic?  The device
> passthrough Stage-2 mappings would still have the Stage-2 memory
> attributes to configure that memory region as device memory.  Why is it
> relevant if the device is DMA-capable in this context?
>
> -Christoffer

Most ARM bootloader/firmware run with MMU off hence, they will not do
explicit cache flushes when programming DMA-capable device. Now If
HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
will not be visible to DMA-capable device.

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16  5:02                     ` Anup Patel
@ 2013-08-16  6:57                       ` Anup Patel
  2013-08-16 17:19                         ` Christoffer Dall
  2013-08-29 10:52                         ` Catalin Marinas
  2013-08-16 17:14                       ` Christoffer Dall
  1 sibling, 2 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-16  6:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>>> On 2013-08-15 16:13, Anup Patel wrote:
>>> > Hi Marc,
>>> >
>>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>>> > wrote:
>>> >> On 2013-08-15 14:31, Anup Patel wrote:
>>> >>>
>>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>>> >>> <marc.zyngier@arm.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>> Hi Marc,
>>> >>>>>
>>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>>> >>>>> <marc.zyngier@arm.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Hi Anup,
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>>> >>>>>>> <marc.zyngier@arm.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> Hi Pranav,
>>> >>>>>>>>
>>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>>> >>>>>>>>> dirty
>>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>>> >>>>>>>>> this,
>>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>>> >>>>>>>>> guest
>>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>>> >>>>>>>>> disabled.
>>> >>>>>>>>>
>>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>>> >>>>>>>>> L3-cache.
>>> >>>>>>>>>
>>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>>> >>>>>>>>> <pranavkumar@linaro.org>
>>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> >>>>>>>>> ---
>>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>> >>>>>>>>>  1 file changed, 2 insertions(+)
>>> >>>>>>>>>
>>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> index efe609c..5129038 100644
>>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>> >>>>>>>>> +             /* Flush d-cache for systems with external
>>> >>>>>>>>> caches. */
>>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>>> >>>>>>>>> ASID-tagged
>>> >>>>>>>>> VIVT
>>> >>>>>>>>> */
>>> >>>>>>>>>               /* any kind of VIPT cache */
>>> >>>>>>>>>               __flush_icache_all();
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>>> >>>>>>>> past]
>>> >>>>>>>>
>>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>>> >>>>>>>> the
>>> >>>>>>>> data
>>> >>>>>>>> cache on each page mapping. It looks overkill.
>>> >>>>>>>>
>>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>> >>>>>>>> kvmtools
>>> >>>>>>>> knows which bits of the guest memory have been touched, and
>>> >>>>>>>> can do a
>>> >>>>>>>> "DC
>>> >>>>>>>> DVAC" on this region.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>>> >>>>>>> I am
>>> >>>>>>> sure
>>> >>>>>>> other architectures don't have such cache cleaning in
>>> >>>>>>> user-space for
>>> >>>>>>> KVM.
>>> >>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>>> >>>>>>>> - but
>>> >>>>>>>> that's not very nice either (have to clean the whole of the
>>> >>>>>>>> guest
>>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>>> >>>>>>> VCPU was
>>> >>>>>>> our second option but current approach seemed very simple hence
>>> >>>>>>> we went for this.
>>> >>>>>>>
>>> >>>>>>> If more people vote for full d-cache clean upon first run of
>>> >>>>>>> VCPU then
>>> >>>>>>> we should revise this patch.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>>> >>>>>> boot-tested
>>> >>>>>> it on a model, but of course I can't really verify that it fixes
>>> >>>>>> your
>>> >>>>>> issue.
>>> >>>>>>
>>> >>>>>> As far as I can see, it should fix it without any additional
>>> >>>>>> flushing.
>>> >>>>>>
>>> >>>>>> Please let me know how it goes.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>>> >>>>> Write-Allocate,
>>> >>>>> Outer Write-Back Write-Allocate memory"
>>> >>>>>
>>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>> >>>>> treated as "Strongly-ordered device memory"
>>> >>>>>
>>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>>> >>>> be
>>> >>>> accessed as device memory.
>>> >>>>
>>> >>>>
>>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>> >>>>> KVM ARM64 because we use VGIC.
>>> >>>>>
>>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>> >>>>> when Stage1 MMU is off.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> See above. My understanding is that HCR.DC controls the default
>>> >>>> output of
>>> >>>> Stage-1, and Stage-2 overrides still apply.
>>> >>>
>>> >>>
>>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>> >>> and wanted guest to decide the memory attribute. In other words,
>>> >>> you
>>> >>> did not want to enforce any memory attribute in Stage2.
>>> >>>
>>> >>> Please refer to this patch
>>> >>> https://patchwork.kernel.org/patch/2543201/
>>> >>
>>> >>
>>> >> This patch has never been merged. If you carry on following the
>>> >> discussion,
>>> >> you will certainly notice it was dropped for a very good reason:
>>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>> >>
>>> >> So Stage-2 memory attributes are used, they are not going away, and
>>> >> they are
>>> >> essential to the patch I sent this morning.
>>> >>
>>> >>
>>> >>         M.
>>> >> --
>>> >> Fast, cheap, reliable. Pick two.
>>> >
>>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>>> > is
>>> > provided a DMA-capable device in pass-through mode. The reason being
>>> > bootloader/firmware typically don't enable MMU and such
>>> > bootloader/firmware
>>> > will programme a pass-through DMA-capable device without any flushes
>>> > to
>>> > guest RAM (because it has not enabled MMU).
>>> >
>>> > A good example of such a device would be SATA AHCI controller given
>>> > to a
>>> > Guest/VM as direct access (using SystemMMU) and Guest
>>> > bootloader/firmware
>>> > accessing this SATA AHCI controller to load kernel images from SATA
>>> > disk.
>>> > In this situation, we will have to hack Guest bootloader/firmware
>>> > AHCI driver to
>>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>>>
>>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>>> curiosity: is that a made up example or something you actually have?
>>>
>>> Back to square one:
>>> Can you please benchmark the various cache cleaning options (global at
>>> startup time, per-page on S2 translation fault, and user-space)?
>>>
>> Eh, why is this a more valid argument than the vgic?  The device
>> passthrough Stage-2 mappings would still have the Stage-2 memory
>> attributes to configure that memory region as device memory.  Why is it
>> relevant if the device is DMA-capable in this context?
>>
>> -Christoffer
>
> Most ARM bootloader/firmware run with MMU off hence, they will not do
> explicit cache flushes when programming DMA-capable device. Now If
> HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> will not be visible to DMA-capable device.
>
> --Anup

The approach of flushing d-cache by set/way upon first run of VCPU will
not work because for set/way operations ARM ARM says: "For set/way
operations, and for All (entire cache) operations, the point is defined to be
to the next level of caching". In other words, set/way operations work upto
point of unification.

Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
at bootup time which does not work for us when L3-cache is enabled so,
there is no point is adding one more __flush_dcache_all() upon first run of
VCPU in KVM ARM64.

IMHO, we are left with following options:
1. Flush all RAM regions of VCPU using __flush_dcache_range()
    upon first run of VCPU
2. Implement outer-cache framework for ARM64 and flush all
    caches + outer cache (i.e. L3-cache) upon first run of VCPU
3. Use an alternate version of flush_icache_range() which will
    flush d-cache by PoC instead of PoU. We can also ensure
    that coherent_icache_guest_page() function will be called
    upon Stage2 prefetch aborts only.

Regards,
Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16  5:02                     ` Anup Patel
  2013-08-16  6:57                       ` Anup Patel
@ 2013-08-16 17:14                       ` Christoffer Dall
       [not found]                         ` <CALrVBkvEP1Q0mKpv8ViOTLRvW2ks18MQXgmurSBHn+aJcz+=gw@mail.gmail.com>
  1 sibling, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
> On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >> On 2013-08-15 16:13, Anup Patel wrote:
> >> > Hi Marc,
> >> >
> >> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> >> > wrote:
> >> >> On 2013-08-15 14:31, Anup Patel wrote:
> >> >>>
> >> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >> >>> <marc.zyngier@arm.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>> Hi Marc,
> >> >>>>>
> >> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >> >>>>> <marc.zyngier@arm.com>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Hi Anup,
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >> >>>>>>> <marc.zyngier@arm.com>
> >> >>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Hi Pranav,
> >> >>>>>>>>
> >> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >> >>>>>>>>> dirty
> >> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >> >>>>>>>>> this,
> >> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >> >>>>>>>>> guest
> >> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >> >>>>>>>>> disabled.
> >> >>>>>>>>>
> >> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >> >>>>>>>>> L3-cache.
> >> >>>>>>>>>
> >> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >> >>>>>>>>> <pranavkumar@linaro.org>
> >> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> >>>>>>>>> ---
> >> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >> >>>>>>>>>  1 file changed, 2 insertions(+)
> >> >>>>>>>>>
> >> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> index efe609c..5129038 100644
> >> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >> >>>>>>>>> +             /* Flush d-cache for systems with external
> >> >>>>>>>>> caches. */
> >> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >> >>>>>>>>> ASID-tagged
> >> >>>>>>>>> VIVT
> >> >>>>>>>>> */
> >> >>>>>>>>>               /* any kind of VIPT cache */
> >> >>>>>>>>>               __flush_icache_all();
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >> >>>>>>>> past]
> >> >>>>>>>>
> >> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >> >>>>>>>> the
> >> >>>>>>>> data
> >> >>>>>>>> cache on each page mapping. It looks overkill.
> >> >>>>>>>>
> >> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >> >>>>>>>> kvmtools
> >> >>>>>>>> knows which bits of the guest memory have been touched, and
> >> >>>>>>>> can do a
> >> >>>>>>>> "DC
> >> >>>>>>>> DVAC" on this region.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >> >>>>>>> I am
> >> >>>>>>> sure
> >> >>>>>>> other architectures don't have such cache cleaning in
> >> >>>>>>> user-space for
> >> >>>>>>> KVM.
> >> >>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >> >>>>>>>> - but
> >> >>>>>>>> that's not very nice either (have to clean the whole of the
> >> >>>>>>>> guest
> >> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >> >>>>>>> VCPU was
> >> >>>>>>> our second option but current approach seemed very simple hence
> >> >>>>>>> we went for this.
> >> >>>>>>>
> >> >>>>>>> If more people vote for full d-cache clean upon first run of
> >> >>>>>>> VCPU then
> >> >>>>>>> we should revise this patch.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >> >>>>>> boot-tested
> >> >>>>>> it on a model, but of course I can't really verify that it fixes
> >> >>>>>> your
> >> >>>>>> issue.
> >> >>>>>>
> >> >>>>>> As far as I can see, it should fix it without any additional
> >> >>>>>> flushing.
> >> >>>>>>
> >> >>>>>> Please let me know how it goes.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >> >>>>> Write-Allocate,
> >> >>>>> Outer Write-Back Write-Allocate memory"
> >> >>>>>
> >> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >> >>>>> treated as "Strongly-ordered device memory"
> >> >>>>>
> >> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >> >>>> be
> >> >>>> accessed as device memory.
> >> >>>>
> >> >>>>
> >> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >> >>>>> KVM ARM64 because we use VGIC.
> >> >>>>>
> >> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >> >>>>> when Stage1 MMU is off.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> See above. My understanding is that HCR.DC controls the default
> >> >>>> output of
> >> >>>> Stage-1, and Stage-2 overrides still apply.
> >> >>>
> >> >>>
> >> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >> >>> and wanted guest to decide the memory attribute. In other words,
> >> >>> you
> >> >>> did not want to enforce any memory attribute in Stage2.
> >> >>>
> >> >>> Please refer to this patch
> >> >>> https://patchwork.kernel.org/patch/2543201/
> >> >>
> >> >>
> >> >> This patch has never been merged. If you carry on following the
> >> >> discussion,
> >> >> you will certainly notice it was dropped for a very good reason:
> >> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >> >>
> >> >> So Stage-2 memory attributes are used, they are not going away, and
> >> >> they are
> >> >> essential to the patch I sent this morning.
> >> >>
> >> >>
> >> >>         M.
> >> >> --
> >> >> Fast, cheap, reliable. Pick two.
> >> >
> >> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >> > is
> >> > provided a DMA-capable device in pass-through mode. The reason being
> >> > bootloader/firmware typically don't enable MMU and such
> >> > bootloader/firmware
> >> > will programme a pass-through DMA-capable device without any flushes
> >> > to
> >> > guest RAM (because it has not enabled MMU).
> >> >
> >> > A good example of such a device would be SATA AHCI controller given
> >> > to a
> >> > Guest/VM as direct access (using SystemMMU) and Guest
> >> > bootloader/firmware
> >> > accessing this SATA AHCI controller to load kernel images from SATA
> >> > disk.
> >> > In this situation, we will have to hack Guest bootloader/firmware
> >> > AHCI driver to
> >> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >>
> >> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >> curiosity: is that a made up example or something you actually have?
> >>
> >> Back to square one:
> >> Can you please benchmark the various cache cleaning options (global at
> >> startup time, per-page on S2 translation fault, and user-space)?
> >>
> > Eh, why is this a more valid argument than the vgic?  The device
> > passthrough Stage-2 mappings would still have the Stage-2 memory
> > attributes to configure that memory region as device memory.  Why is it
> > relevant if the device is DMA-capable in this context?
> >
> 
> Most ARM bootloader/firmware run with MMU off hence, they will not do
> explicit cache flushes when programming DMA-capable device. Now If
> HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> will not be visible to DMA-capable device.
>
Read my question again: The bootloaders running with the MMU off in a VM
will only disable the MMU for Stage-1 translations.  Stage-2
translations are still enforced using hte Stage-2 page tables and their
attributes for all mappings to devices will still enforce
strongly-ordered device type memory.

Again, what does it matter if the device is DMA-capable or not?

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16  6:57                       ` Anup Patel
@ 2013-08-16 17:19                         ` Christoffer Dall
  2013-08-16 17:42                           ` Anup Patel
  2013-08-29 10:52                         ` Catalin Marinas
  1 sibling, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > <christoffer.dall@linaro.org> wrote:
> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >>> On 2013-08-15 16:13, Anup Patel wrote:
> >>> > Hi Marc,
> >>> >
> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> >>> > wrote:
> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
> >>> >>>
> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >>> >>> <marc.zyngier@arm.com>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Hi Marc,
> >>> >>>>>
> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >>> >>>>> <marc.zyngier@arm.com>
> >>> >>>>> wrote:
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Hi Anup,
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >>> >>>>>>> <marc.zyngier@arm.com>
> >>> >>>>>>> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> Hi Pranav,
> >>> >>>>>>>>
> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >>> >>>>>>>>> dirty
> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >>> >>>>>>>>> this,
> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >>> >>>>>>>>> guest
> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >>> >>>>>>>>> disabled.
> >>> >>>>>>>>>
> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >>> >>>>>>>>> L3-cache.
> >>> >>>>>>>>>
> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >>> >>>>>>>>> <pranavkumar@linaro.org>
> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >>> >>>>>>>>> ---
> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
> >>> >>>>>>>>>
> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> index efe609c..5129038 100644
> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
> >>> >>>>>>>>> caches. */
> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >>> >>>>>>>>> ASID-tagged
> >>> >>>>>>>>> VIVT
> >>> >>>>>>>>> */
> >>> >>>>>>>>>               /* any kind of VIPT cache */
> >>> >>>>>>>>>               __flush_icache_all();
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >>> >>>>>>>> past]
> >>> >>>>>>>>
> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >>> >>>>>>>> the
> >>> >>>>>>>> data
> >>> >>>>>>>> cache on each page mapping. It looks overkill.
> >>> >>>>>>>>
> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >>> >>>>>>>> kvmtools
> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
> >>> >>>>>>>> can do a
> >>> >>>>>>>> "DC
> >>> >>>>>>>> DVAC" on this region.
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >>> >>>>>>> I am
> >>> >>>>>>> sure
> >>> >>>>>>> other architectures don't have such cache cleaning in
> >>> >>>>>>> user-space for
> >>> >>>>>>> KVM.
> >>> >>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >>> >>>>>>>> - but
> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
> >>> >>>>>>>> guest
> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >>> >>>>>>> VCPU was
> >>> >>>>>>> our second option but current approach seemed very simple hence
> >>> >>>>>>> we went for this.
> >>> >>>>>>>
> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
> >>> >>>>>>> VCPU then
> >>> >>>>>>> we should revise this patch.
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >>> >>>>>> boot-tested
> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
> >>> >>>>>> your
> >>> >>>>>> issue.
> >>> >>>>>>
> >>> >>>>>> As far as I can see, it should fix it without any additional
> >>> >>>>>> flushing.
> >>> >>>>>>
> >>> >>>>>> Please let me know how it goes.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >>> >>>>> Write-Allocate,
> >>> >>>>> Outer Write-Back Write-Allocate memory"
> >>> >>>>>
> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >>> >>>>> treated as "Strongly-ordered device memory"
> >>> >>>>>
> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >>> >>>> be
> >>> >>>> accessed as device memory.
> >>> >>>>
> >>> >>>>
> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >>> >>>>> KVM ARM64 because we use VGIC.
> >>> >>>>>
> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >>> >>>>> when Stage1 MMU is off.
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> See above. My understanding is that HCR.DC controls the default
> >>> >>>> output of
> >>> >>>> Stage-1, and Stage-2 overrides still apply.
> >>> >>>
> >>> >>>
> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >>> >>> and wanted guest to decide the memory attribute. In other words,
> >>> >>> you
> >>> >>> did not want to enforce any memory attribute in Stage2.
> >>> >>>
> >>> >>> Please refer to this patch
> >>> >>> https://patchwork.kernel.org/patch/2543201/
> >>> >>
> >>> >>
> >>> >> This patch has never been merged. If you carry on following the
> >>> >> discussion,
> >>> >> you will certainly notice it was dropped for a very good reason:
> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >>> >>
> >>> >> So Stage-2 memory attributes are used, they are not going away, and
> >>> >> they are
> >>> >> essential to the patch I sent this morning.
> >>> >>
> >>> >>
> >>> >>         M.
> >>> >> --
> >>> >> Fast, cheap, reliable. Pick two.
> >>> >
> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >>> > is
> >>> > provided a DMA-capable device in pass-through mode. The reason being
> >>> > bootloader/firmware typically don't enable MMU and such
> >>> > bootloader/firmware
> >>> > will programme a pass-through DMA-capable device without any flushes
> >>> > to
> >>> > guest RAM (because it has not enabled MMU).
> >>> >
> >>> > A good example of such a device would be SATA AHCI controller given
> >>> > to a
> >>> > Guest/VM as direct access (using SystemMMU) and Guest
> >>> > bootloader/firmware
> >>> > accessing this SATA AHCI controller to load kernel images from SATA
> >>> > disk.
> >>> > In this situation, we will have to hack Guest bootloader/firmware
> >>> > AHCI driver to
> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >>>
> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >>> curiosity: is that a made up example or something you actually have?
> >>>
> >>> Back to square one:
> >>> Can you please benchmark the various cache cleaning options (global at
> >>> startup time, per-page on S2 translation fault, and user-space)?
> >>>
> >> Eh, why is this a more valid argument than the vgic?  The device
> >> passthrough Stage-2 mappings would still have the Stage-2 memory
> >> attributes to configure that memory region as device memory.  Why is it
> >> relevant if the device is DMA-capable in this context?
> >>
> >> -Christoffer
> >
> > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > explicit cache flushes when programming DMA-capable device. Now If
> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > will not be visible to DMA-capable device.
> >
> > --Anup
> 
> The approach of flushing d-cache by set/way upon first run of VCPU will
> not work because for set/way operations ARM ARM says: "For set/way
> operations, and for All (entire cache) operations, the point is defined to be
> to the next level of caching". In other words, set/way operations work upto
> point of unification.
> 
> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> at bootup time which does not work for us when L3-cache is enabled so,
> there is no point is adding one more __flush_dcache_all() upon first run of
> VCPU in KVM ARM64.

When did anyone suggest using a cache cleaning method that doesn't apply
to this case.  I'm a little confused about your comment here.

> 
> IMHO, we are left with following options:
> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>     upon first run of VCPU

We could do that, but we have to ensure that no other memory regions can
be added later.  Is this the case?  I don't remember.

> 2. Implement outer-cache framework for ARM64 and flush all
>     caches + outer cache (i.e. L3-cache) upon first run of VCPU

What's the difference between (2) and (1)?

> 3. Use an alternate version of flush_icache_range() which will
>     flush d-cache by PoC instead of PoU. We can also ensure
>     that coherent_icache_guest_page() function will be called
>     upon Stage2 prefetch aborts only.
> 

I'm sorry, but I'm not following this discussion.  Are we not mixing a
discussion along two different axis here?  As I see it there are two
separate issues:

 (A) When/Where to flush the memory regions
 (B) How to flush the memory regions, including the hardware method and
     the kernel software engineering aspect.

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
       [not found]                         ` <CALrVBkvEP1Q0mKpv8ViOTLRvW2ks18MQXgmurSBHn+aJcz+=gw@mail.gmail.com>
@ 2013-08-16 17:28                           ` Christoffer Dall
  2013-08-16 17:42                             ` Christoffer Dall
  0 siblings, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 17:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:48:39PM +0530, Anup Patel wrote:
> On 16 August 2013 22:44, Christoffer Dall <christoffer.dall@linaro.org>wrote:
> 
> > On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
> > > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > > <christoffer.dall@linaro.org> wrote:
> > > > On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> > > >> On 2013-08-15 16:13, Anup Patel wrote:
> > > >> > Hi Marc,
> > > >> >
> > > >> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com
> > >
> > > >> > wrote:
> > > >> >> On 2013-08-15 14:31, Anup Patel wrote:
> > > >> >>>
> > > >> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> > > >> >>> <marc.zyngier@arm.com>
> > > >> >>> wrote:
> > > >> >>>>
> > > >> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> Hi Marc,
> > > >> >>>>>
> > > >> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> > > >> >>>>> <marc.zyngier@arm.com>
> > > >> >>>>> wrote:
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> Hi Anup,
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> > > >> >>>>>>> <marc.zyngier@arm.com>
> > > >> >>>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> Hi Pranav,
> > > >> >>>>>>>>
> > > >> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> > > >> >>>>>>>>> dirty
> > > >> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> > > >> >>>>>>>>> this,
> > > >> >>>>>>>>> we need to flush such dirty content from d-cache so that
> > > >> >>>>>>>>> guest
> > > >> >>>>>>>>> will see correct contents of guest page when guest MMU is
> > > >> >>>>>>>>> disabled.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> > > >> >>>>>>>>> L3-cache.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> > > >> >>>>>>>>> <pranavkumar@linaro.org>
> > > >> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> > > >> >>>>>>>>> ---
> > > >> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> > > >> >>>>>>>>>  1 file changed, 2 insertions(+)
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> > > >> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> > > >> >>>>>>>>> index efe609c..5129038 100644
> > > >> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> > > >> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > >> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> > > >> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> > > >> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> > > >> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> > > >> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> > > >> >>>>>>>>> +             /* Flush d-cache for systems with external
> > > >> >>>>>>>>> caches. */
> > > >> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> > > >> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> > > >> >>>>>>>>> ASID-tagged
> > > >> >>>>>>>>> VIVT
> > > >> >>>>>>>>> */
> > > >> >>>>>>>>>               /* any kind of VIPT cache */
> > > >> >>>>>>>>>               __flush_icache_all();
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> [adding Will to the discussion as we talked about this in the
> > > >> >>>>>>>> past]
> > > >> >>>>>>>>
> > > >> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to
> > hit
> > > >> >>>>>>>> the
> > > >> >>>>>>>> data
> > > >> >>>>>>>> cache on each page mapping. It looks overkill.
> > > >> >>>>>>>>
> > > >> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> > > >> >>>>>>>> kvmtools
> > > >> >>>>>>>> knows which bits of the guest memory have been touched, and
> > > >> >>>>>>>> can do a
> > > >> >>>>>>>> "DC
> > > >> >>>>>>>> DVAC" on this region.
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> > > >> >>>>>>> I am
> > > >> >>>>>>> sure
> > > >> >>>>>>> other architectures don't have such cache cleaning in
> > > >> >>>>>>> user-space for
> > > >> >>>>>>> KVM.
> > > >> >>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> The alternative is do it in the kernel before running any
> > vcpu
> > > >> >>>>>>>> - but
> > > >> >>>>>>>> that's not very nice either (have to clean the whole of the
> > > >> >>>>>>>> guest
> > > >> >>>>>>>> memory, which makes a full dcache clean more appealing).
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> > > >> >>>>>>> VCPU was
> > > >> >>>>>>> our second option but current approach seemed very simple
> > hence
> > > >> >>>>>>> we went for this.
> > > >> >>>>>>>
> > > >> >>>>>>> If more people vote for full d-cache clean upon first run of
> > > >> >>>>>>> VCPU then
> > > >> >>>>>>> we should revise this patch.
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> Can you please give the attached patch a spin on your HW? I've
> > > >> >>>>>> boot-tested
> > > >> >>>>>> it on a model, but of course I can't really verify that it
> > fixes
> > > >> >>>>>> your
> > > >> >>>>>> issue.
> > > >> >>>>>>
> > > >> >>>>>> As far as I can see, it should fix it without any additional
> > > >> >>>>>> flushing.
> > > >> >>>>>>
> > > >> >>>>>> Please let me know how it goes.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> > > >> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> > > >> >>>>> Write-Allocate,
> > > >> >>>>> Outer Write-Back Write-Allocate memory"
> > > >> >>>>>
> > > >> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> > > >> >>>>> treated as "Strongly-ordered device memory"
> > > >> >>>>>
> > > >> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> > > >> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> > > >> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> > > >> >>>> be
> > > >> >>>> accessed as device memory.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> > > >> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> > > >> >>>>> KVM ARM64 because we use VGIC.
> > > >> >>>>>
> > > >> >>>>> IMHO, this patch enforces incorrect memory attribute for
> > Guest/VM
> > > >> >>>>> when Stage1 MMU is off.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> See above. My understanding is that HCR.DC controls the default
> > > >> >>>> output of
> > > >> >>>> Stage-1, and Stage-2 overrides still apply.
> > > >> >>>
> > > >> >>>
> > > >> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> > > >> >>> and wanted guest to decide the memory attribute. In other words,
> > > >> >>> you
> > > >> >>> did not want to enforce any memory attribute in Stage2.
> > > >> >>>
> > > >> >>> Please refer to this patch
> > > >> >>> https://patchwork.kernel.org/patch/2543201/
> > > >> >>
> > > >> >>
> > > >> >> This patch has never been merged. If you carry on following the
> > > >> >> discussion,
> > > >> >> you will certainly notice it was dropped for a very good reason:
> > > >> >>
> > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> > > >> >>
> > > >> >> So Stage-2 memory attributes are used, they are not going away, and
> > > >> >> they are
> > > >> >> essential to the patch I sent this morning.
> > > >> >>
> > > >> >>
> > > >> >>         M.
> > > >> >> --
> > > >> >> Fast, cheap, reliable. Pick two.
> > > >> >
> > > >> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> > > >> > is
> > > >> > provided a DMA-capable device in pass-through mode. The reason being
> > > >> > bootloader/firmware typically don't enable MMU and such
> > > >> > bootloader/firmware
> > > >> > will programme a pass-through DMA-capable device without any flushes
> > > >> > to
> > > >> > guest RAM (because it has not enabled MMU).
> > > >> >
> > > >> > A good example of such a device would be SATA AHCI controller given
> > > >> > to a
> > > >> > Guest/VM as direct access (using SystemMMU) and Guest
> > > >> > bootloader/firmware
> > > >> > accessing this SATA AHCI controller to load kernel images from SATA
> > > >> > disk.
> > > >> > In this situation, we will have to hack Guest bootloader/firmware
> > > >> > AHCI driver to
> > > >> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> > > >>
> > > >> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out
> > of
> > > >> curiosity: is that a made up example or something you actually have?
> > > >>
> > > >> Back to square one:
> > > >> Can you please benchmark the various cache cleaning options (global at
> > > >> startup time, per-page on S2 translation fault, and user-space)?
> > > >>
> > > > Eh, why is this a more valid argument than the vgic?  The device
> > > > passthrough Stage-2 mappings would still have the Stage-2 memory
> > > > attributes to configure that memory region as device memory.  Why is it
> > > > relevant if the device is DMA-capable in this context?
> > > >
> > >
> > > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > > explicit cache flushes when programming DMA-capable device. Now If
> > > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > > will not be visible to DMA-capable device.
> > >
> > Read my question again: The bootloaders running with the MMU off in a VM
> > will only disable the MMU for Stage-1 translations.  Stage-2
> > translations are still enforced using hte Stage-2 page tables and their
> > attributes for all mappings to devices will still enforce
> > strongly-ordered device type memory.
> >
> 
> Please read my reply again. Also try to read-up SATA AHCI spec.
> 
>

Can you please explain how the specs realate to my question?

The only case I can see is if the guest puts data in a DMA region that
it expects the device to read.  Is this the case?

Just so we're clear, this is quite different from programming the device
through the device memory.

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 17:28                           ` Christoffer Dall
@ 2013-08-16 17:42                             ` Christoffer Dall
  0 siblings, 0 replies; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:28:06AM -0700, Christoffer Dall wrote:
> On Fri, Aug 16, 2013 at 10:48:39PM +0530, Anup Patel wrote:
> > On 16 August 2013 22:44, Christoffer Dall <christoffer.dall@linaro.org>wrote:
> > 
> > > On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
> > > > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > > > <christoffer.dall@linaro.org> wrote:
> > > > > On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> > > > >> On 2013-08-15 16:13, Anup Patel wrote:
> > > > >> > Hi Marc,
> > > > >> >
> > > > >> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com
> > > >
> > > > >> > wrote:
> > > > >> >> On 2013-08-15 14:31, Anup Patel wrote:
> > > > >> >>>
> > > > >> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> > > > >> >>> <marc.zyngier@arm.com>
> > > > >> >>> wrote:
> > > > >> >>>>
> > > > >> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> Hi Marc,
> > > > >> >>>>>
> > > > >> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> > > > >> >>>>> <marc.zyngier@arm.com>
> > > > >> >>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Hi Anup,
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> > > > >> >>>>>>> <marc.zyngier@arm.com>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Hi Pranav,
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> > > > >> >>>>>>>>> dirty
> > > > >> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> > > > >> >>>>>>>>> this,
> > > > >> >>>>>>>>> we need to flush such dirty content from d-cache so that
> > > > >> >>>>>>>>> guest
> > > > >> >>>>>>>>> will see correct contents of guest page when guest MMU is
> > > > >> >>>>>>>>> disabled.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> > > > >> >>>>>>>>> L3-cache.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> > > > >> >>>>>>>>> <pranavkumar@linaro.org>
> > > > >> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> > > > >> >>>>>>>>> ---
> > > > >> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> > > > >> >>>>>>>>>  1 file changed, 2 insertions(+)
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> index efe609c..5129038 100644
> > > > >> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> > > > >> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> > > > >> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> > > > >> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> > > > >> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> > > > >> >>>>>>>>> +             /* Flush d-cache for systems with external
> > > > >> >>>>>>>>> caches. */
> > > > >> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> > > > >> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> > > > >> >>>>>>>>> ASID-tagged
> > > > >> >>>>>>>>> VIVT
> > > > >> >>>>>>>>> */
> > > > >> >>>>>>>>>               /* any kind of VIPT cache */
> > > > >> >>>>>>>>>               __flush_icache_all();
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> [adding Will to the discussion as we talked about this in the
> > > > >> >>>>>>>> past]
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to
> > > hit
> > > > >> >>>>>>>> the
> > > > >> >>>>>>>> data
> > > > >> >>>>>>>> cache on each page mapping. It looks overkill.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> > > > >> >>>>>>>> kvmtools
> > > > >> >>>>>>>> knows which bits of the guest memory have been touched, and
> > > > >> >>>>>>>> can do a
> > > > >> >>>>>>>> "DC
> > > > >> >>>>>>>> DVAC" on this region.
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> > > > >> >>>>>>> I am
> > > > >> >>>>>>> sure
> > > > >> >>>>>>> other architectures don't have such cache cleaning in
> > > > >> >>>>>>> user-space for
> > > > >> >>>>>>> KVM.
> > > > >> >>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> The alternative is do it in the kernel before running any
> > > vcpu
> > > > >> >>>>>>>> - but
> > > > >> >>>>>>>> that's not very nice either (have to clean the whole of the
> > > > >> >>>>>>>> guest
> > > > >> >>>>>>>> memory, which makes a full dcache clean more appealing).
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> > > > >> >>>>>>> VCPU was
> > > > >> >>>>>>> our second option but current approach seemed very simple
> > > hence
> > > > >> >>>>>>> we went for this.
> > > > >> >>>>>>>
> > > > >> >>>>>>> If more people vote for full d-cache clean upon first run of
> > > > >> >>>>>>> VCPU then
> > > > >> >>>>>>> we should revise this patch.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Can you please give the attached patch a spin on your HW? I've
> > > > >> >>>>>> boot-tested
> > > > >> >>>>>> it on a model, but of course I can't really verify that it
> > > fixes
> > > > >> >>>>>> your
> > > > >> >>>>>> issue.
> > > > >> >>>>>>
> > > > >> >>>>>> As far as I can see, it should fix it without any additional
> > > > >> >>>>>> flushing.
> > > > >> >>>>>>
> > > > >> >>>>>> Please let me know how it goes.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> > > > >> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> > > > >> >>>>> Write-Allocate,
> > > > >> >>>>> Outer Write-Back Write-Allocate memory"
> > > > >> >>>>>
> > > > >> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> > > > >> >>>>> treated as "Strongly-ordered device memory"
> > > > >> >>>>>
> > > > >> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> > > > >> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> > > > >> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> > > > >> >>>> be
> > > > >> >>>> accessed as device memory.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> > > > >> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> > > > >> >>>>> KVM ARM64 because we use VGIC.
> > > > >> >>>>>
> > > > >> >>>>> IMHO, this patch enforces incorrect memory attribute for
> > > Guest/VM
> > > > >> >>>>> when Stage1 MMU is off.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> See above. My understanding is that HCR.DC controls the default
> > > > >> >>>> output of
> > > > >> >>>> Stage-1, and Stage-2 overrides still apply.
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> > > > >> >>> and wanted guest to decide the memory attribute. In other words,
> > > > >> >>> you
> > > > >> >>> did not want to enforce any memory attribute in Stage2.
> > > > >> >>>
> > > > >> >>> Please refer to this patch
> > > > >> >>> https://patchwork.kernel.org/patch/2543201/
> > > > >> >>
> > > > >> >>
> > > > >> >> This patch has never been merged. If you carry on following the
> > > > >> >> discussion,
> > > > >> >> you will certainly notice it was dropped for a very good reason:
> > > > >> >>
> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> > > > >> >>
> > > > >> >> So Stage-2 memory attributes are used, they are not going away, and
> > > > >> >> they are
> > > > >> >> essential to the patch I sent this morning.
> > > > >> >>
> > > > >> >>
> > > > >> >>         M.
> > > > >> >> --
> > > > >> >> Fast, cheap, reliable. Pick two.
> > > > >> >
> > > > >> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> > > > >> > is
> > > > >> > provided a DMA-capable device in pass-through mode. The reason being
> > > > >> > bootloader/firmware typically don't enable MMU and such
> > > > >> > bootloader/firmware
> > > > >> > will programme a pass-through DMA-capable device without any flushes
> > > > >> > to
> > > > >> > guest RAM (because it has not enabled MMU).
> > > > >> >
> > > > >> > A good example of such a device would be SATA AHCI controller given
> > > > >> > to a
> > > > >> > Guest/VM as direct access (using SystemMMU) and Guest
> > > > >> > bootloader/firmware
> > > > >> > accessing this SATA AHCI controller to load kernel images from SATA
> > > > >> > disk.
> > > > >> > In this situation, we will have to hack Guest bootloader/firmware
> > > > >> > AHCI driver to
> > > > >> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> > > > >>
> > > > >> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out
> > > of
> > > > >> curiosity: is that a made up example or something you actually have?
> > > > >>
> > > > >> Back to square one:
> > > > >> Can you please benchmark the various cache cleaning options (global at
> > > > >> startup time, per-page on S2 translation fault, and user-space)?
> > > > >>
> > > > > Eh, why is this a more valid argument than the vgic?  The device
> > > > > passthrough Stage-2 mappings would still have the Stage-2 memory
> > > > > attributes to configure that memory region as device memory.  Why is it
> > > > > relevant if the device is DMA-capable in this context?
> > > > >
> > > >
> > > > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > > > explicit cache flushes when programming DMA-capable device. Now If
> > > > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > > > will not be visible to DMA-capable device.
> > > >
> > > Read my question again: The bootloaders running with the MMU off in a VM
> > > will only disable the MMU for Stage-1 translations.  Stage-2
> > > translations are still enforced using hte Stage-2 page tables and their
> > > attributes for all mappings to devices will still enforce
> > > strongly-ordered device type memory.
> > >
> > 
> > Please read my reply again. Also try to read-up SATA AHCI spec.
> > 
> >
> 
> Can you please explain how the specs realate to my question?
> 
> The only case I can see is if the guest puts data in a DMA region that
> it expects the device to read.  Is this the case?
> 
> Just so we're clear, this is quite different from programming the device
> through the device memory.
> 

ok, I see from the specs.  It's not just DMA-capable, but actually
DMA-programmable.  In that case setting DC=1 would be problematic.

Thanks for being so helpful in explaining this ;)

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 17:19                         ` Christoffer Dall
@ 2013-08-16 17:42                           ` Anup Patel
  2013-08-16 17:50                             ` Christoffer Dall
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-16 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
>> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
>> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
>> > <christoffer.dall@linaro.org> wrote:
>> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>> >>> On 2013-08-15 16:13, Anup Patel wrote:
>> >>> > Hi Marc,
>> >>> >
>> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>> >>> > wrote:
>> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
>> >>> >>>
>> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>> >>> >>> <marc.zyngier@arm.com>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> Hi Marc,
>> >>> >>>>>
>> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>> >>> >>>>> <marc.zyngier@arm.com>
>> >>> >>>>> wrote:
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> Hi Anup,
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>> >>> >>>>>>> <marc.zyngier@arm.com>
>> >>> >>>>>>> wrote:
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> Hi Pranav,
>> >>> >>>>>>>>
>> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>> >>> >>>>>>>>> dirty
>> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>> >>> >>>>>>>>> this,
>> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>> >>> >>>>>>>>> guest
>> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>> >>> >>>>>>>>> disabled.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>> >>> >>>>>>>>> L3-cache.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>> >>> >>>>>>>>> <pranavkumar@linaro.org>
>> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> >>> >>>>>>>>> ---
>> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>> >>> >>>>>>>>> index efe609c..5129038 100644
>> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
>> >>> >>>>>>>>> caches. */
>> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>> >>> >>>>>>>>> ASID-tagged
>> >>> >>>>>>>>> VIVT
>> >>> >>>>>>>>> */
>> >>> >>>>>>>>>               /* any kind of VIPT cache */
>> >>> >>>>>>>>>               __flush_icache_all();
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>> >>> >>>>>>>> past]
>> >>> >>>>>>>>
>> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>> >>> >>>>>>>> the
>> >>> >>>>>>>> data
>> >>> >>>>>>>> cache on each page mapping. It looks overkill.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>> >>> >>>>>>>> kvmtools
>> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
>> >>> >>>>>>>> can do a
>> >>> >>>>>>>> "DC
>> >>> >>>>>>>> DVAC" on this region.
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>> >>> >>>>>>> I am
>> >>> >>>>>>> sure
>> >>> >>>>>>> other architectures don't have such cache cleaning in
>> >>> >>>>>>> user-space for
>> >>> >>>>>>> KVM.
>> >>> >>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>> >>> >>>>>>>> - but
>> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
>> >>> >>>>>>>> guest
>> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>> >>> >>>>>>> VCPU was
>> >>> >>>>>>> our second option but current approach seemed very simple hence
>> >>> >>>>>>> we went for this.
>> >>> >>>>>>>
>> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
>> >>> >>>>>>> VCPU then
>> >>> >>>>>>> we should revise this patch.
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>> >>> >>>>>> boot-tested
>> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
>> >>> >>>>>> your
>> >>> >>>>>> issue.
>> >>> >>>>>>
>> >>> >>>>>> As far as I can see, it should fix it without any additional
>> >>> >>>>>> flushing.
>> >>> >>>>>>
>> >>> >>>>>> Please let me know how it goes.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>> >>> >>>>> Write-Allocate,
>> >>> >>>>> Outer Write-Back Write-Allocate memory"
>> >>> >>>>>
>> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> >>> >>>>> treated as "Strongly-ordered device memory"
>> >>> >>>>>
>> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>> >>> >>>> be
>> >>> >>>> accessed as device memory.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>> >>> >>>>> KVM ARM64 because we use VGIC.
>> >>> >>>>>
>> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> >>> >>>>> when Stage1 MMU is off.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> See above. My understanding is that HCR.DC controls the default
>> >>> >>>> output of
>> >>> >>>> Stage-1, and Stage-2 overrides still apply.
>> >>> >>>
>> >>> >>>
>> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> >>> >>> and wanted guest to decide the memory attribute. In other words,
>> >>> >>> you
>> >>> >>> did not want to enforce any memory attribute in Stage2.
>> >>> >>>
>> >>> >>> Please refer to this patch
>> >>> >>> https://patchwork.kernel.org/patch/2543201/
>> >>> >>
>> >>> >>
>> >>> >> This patch has never been merged. If you carry on following the
>> >>> >> discussion,
>> >>> >> you will certainly notice it was dropped for a very good reason:
>> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>> >>> >>
>> >>> >> So Stage-2 memory attributes are used, they are not going away, and
>> >>> >> they are
>> >>> >> essential to the patch I sent this morning.
>> >>> >>
>> >>> >>
>> >>> >>         M.
>> >>> >> --
>> >>> >> Fast, cheap, reliable. Pick two.
>> >>> >
>> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>> >>> > is
>> >>> > provided a DMA-capable device in pass-through mode. The reason being
>> >>> > bootloader/firmware typically don't enable MMU and such
>> >>> > bootloader/firmware
>> >>> > will programme a pass-through DMA-capable device without any flushes
>> >>> > to
>> >>> > guest RAM (because it has not enabled MMU).
>> >>> >
>> >>> > A good example of such a device would be SATA AHCI controller given
>> >>> > to a
>> >>> > Guest/VM as direct access (using SystemMMU) and Guest
>> >>> > bootloader/firmware
>> >>> > accessing this SATA AHCI controller to load kernel images from SATA
>> >>> > disk.
>> >>> > In this situation, we will have to hack Guest bootloader/firmware
>> >>> > AHCI driver to
>> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>> >>>
>> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>> >>> curiosity: is that a made up example or something you actually have?
>> >>>
>> >>> Back to square one:
>> >>> Can you please benchmark the various cache cleaning options (global at
>> >>> startup time, per-page on S2 translation fault, and user-space)?
>> >>>
>> >> Eh, why is this a more valid argument than the vgic?  The device
>> >> passthrough Stage-2 mappings would still have the Stage-2 memory
>> >> attributes to configure that memory region as device memory.  Why is it
>> >> relevant if the device is DMA-capable in this context?
>> >>
>> >> -Christoffer
>> >
>> > Most ARM bootloader/firmware run with MMU off hence, they will not do
>> > explicit cache flushes when programming DMA-capable device. Now If
>> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
>> > will not be visible to DMA-capable device.
>> >
>> > --Anup
>>
>> The approach of flushing d-cache by set/way upon first run of VCPU will
>> not work because for set/way operations ARM ARM says: "For set/way
>> operations, and for All (entire cache) operations, the point is defined to be
>> to the next level of caching". In other words, set/way operations work upto
>> point of unification.
>>
>> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
>> at bootup time which does not work for us when L3-cache is enabled so,
>> there is no point is adding one more __flush_dcache_all() upon first run of
>> VCPU in KVM ARM64.
>
> When did anyone suggest using a cache cleaning method that doesn't apply
> to this case.  I'm a little confused about your comment here.

Please refer last reply from Marc Z where he says:
"Can you please benchmark the various cache cleaning options (global at
startup time, per-page on S2 translation fault, and user-space)?".

Rather doing __flush_dcache_range() on each page in
coherent_icache_guest_page() we could also flush entire d-cache upon
first VCPU run. This only issue in flushing d-cache upon first VCPU run
is that we cannot flush d-cache by set/way because as per ARM ARM
all operations by set/way are upto PoU and not PoC. In presence of
L3-Cache we need to flush d-cache upto PoC so we have to use
__flush_dache_range()

>
>>
>> IMHO, we are left with following options:
>> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>>     upon first run of VCPU
>
> We could do that, but we have to ensure that no other memory regions can
> be added later.  Is this the case?  I don't remember.

Yes, memory regions being added later be problem for this option.

>
>> 2. Implement outer-cache framework for ARM64 and flush all
>>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
>
> What's the difference between (2) and (1)?

Linux ARM (i.e. 32-bit port) has a framework for having outer
caches (i.e. caches that are not part of the CPU but exist as
separate entity between CPU and Memory for performance
improvement). Using this framework we can flush all CPU caches
and outer cache.

>
>> 3. Use an alternate version of flush_icache_range() which will
>>     flush d-cache by PoC instead of PoU. We can also ensure
>>     that coherent_icache_guest_page() function will be called
>>     upon Stage2 prefetch aborts only.
>>
>
> I'm sorry, but I'm not following this discussion.  Are we not mixing a
> discussion along two different axis here?  As I see it there are two
> separate issues:
>
>  (A) When/Where to flush the memory regions
>  (B) How to flush the memory regions, including the hardware method and
>      the kernel software engineering aspect.

Discussion here is about getting KVM ARM64 working in-presence
of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
user-space typically loads images to guest RAM so, in-presence of
huge L3-cache (few MBs). When the VCPU starts running some of the
contents guest RAM will be still in L3-cache and VCPU runs with
MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
see incorrect contents. To solve this problem we need to flush the
guest RAM contents before they are accessed by first time by VCPU.

>
> -Christoffer

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 17:42                           ` Anup Patel
@ 2013-08-16 17:50                             ` Christoffer Dall
  2013-08-16 18:06                               ` Christoffer Dall
  2013-08-16 18:11                               ` Anup Patel
  0 siblings, 2 replies; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
> On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
> >> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> >> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> >> > <christoffer.dall@linaro.org> wrote:
> >> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >> >>> On 2013-08-15 16:13, Anup Patel wrote:
> >> >>> > Hi Marc,
> >> >>> >
> >> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> >> >>> > wrote:
> >> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
> >> >>> >>>
> >> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >> >>> >>> <marc.zyngier@arm.com>
> >> >>> >>> wrote:
> >> >>> >>>>
> >> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> Hi Marc,
> >> >>> >>>>>
> >> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >> >>> >>>>> <marc.zyngier@arm.com>
> >> >>> >>>>> wrote:
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> Hi Anup,
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >> >>> >>>>>>> <marc.zyngier@arm.com>
> >> >>> >>>>>>> wrote:
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> Hi Pranav,
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >> >>> >>>>>>>>> dirty
> >> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >> >>> >>>>>>>>> this,
> >> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >> >>> >>>>>>>>> guest
> >> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >> >>> >>>>>>>>> disabled.
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >> >>> >>>>>>>>> L3-cache.
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >> >>> >>>>>>>>> <pranavkumar@linaro.org>
> >> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> >>> >>>>>>>>> ---
> >> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >> >>> >>>>>>>>> index efe609c..5129038 100644
> >> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
> >> >>> >>>>>>>>> caches. */
> >> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >> >>> >>>>>>>>> ASID-tagged
> >> >>> >>>>>>>>> VIVT
> >> >>> >>>>>>>>> */
> >> >>> >>>>>>>>>               /* any kind of VIPT cache */
> >> >>> >>>>>>>>>               __flush_icache_all();
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >> >>> >>>>>>>> past]
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >> >>> >>>>>>>> the
> >> >>> >>>>>>>> data
> >> >>> >>>>>>>> cache on each page mapping. It looks overkill.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >> >>> >>>>>>>> kvmtools
> >> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
> >> >>> >>>>>>>> can do a
> >> >>> >>>>>>>> "DC
> >> >>> >>>>>>>> DVAC" on this region.
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >> >>> >>>>>>> I am
> >> >>> >>>>>>> sure
> >> >>> >>>>>>> other architectures don't have such cache cleaning in
> >> >>> >>>>>>> user-space for
> >> >>> >>>>>>> KVM.
> >> >>> >>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >> >>> >>>>>>>> - but
> >> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
> >> >>> >>>>>>>> guest
> >> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >> >>> >>>>>>> VCPU was
> >> >>> >>>>>>> our second option but current approach seemed very simple hence
> >> >>> >>>>>>> we went for this.
> >> >>> >>>>>>>
> >> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
> >> >>> >>>>>>> VCPU then
> >> >>> >>>>>>> we should revise this patch.
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >> >>> >>>>>> boot-tested
> >> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
> >> >>> >>>>>> your
> >> >>> >>>>>> issue.
> >> >>> >>>>>>
> >> >>> >>>>>> As far as I can see, it should fix it without any additional
> >> >>> >>>>>> flushing.
> >> >>> >>>>>>
> >> >>> >>>>>> Please let me know how it goes.
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >> >>> >>>>> Write-Allocate,
> >> >>> >>>>> Outer Write-Back Write-Allocate memory"
> >> >>> >>>>>
> >> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >> >>> >>>>> treated as "Strongly-ordered device memory"
> >> >>> >>>>>
> >> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >> >>> >>>> be
> >> >>> >>>> accessed as device memory.
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >> >>> >>>>> KVM ARM64 because we use VGIC.
> >> >>> >>>>>
> >> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >> >>> >>>>> when Stage1 MMU is off.
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> See above. My understanding is that HCR.DC controls the default
> >> >>> >>>> output of
> >> >>> >>>> Stage-1, and Stage-2 overrides still apply.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >> >>> >>> and wanted guest to decide the memory attribute. In other words,
> >> >>> >>> you
> >> >>> >>> did not want to enforce any memory attribute in Stage2.
> >> >>> >>>
> >> >>> >>> Please refer to this patch
> >> >>> >>> https://patchwork.kernel.org/patch/2543201/
> >> >>> >>
> >> >>> >>
> >> >>> >> This patch has never been merged. If you carry on following the
> >> >>> >> discussion,
> >> >>> >> you will certainly notice it was dropped for a very good reason:
> >> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >> >>> >>
> >> >>> >> So Stage-2 memory attributes are used, they are not going away, and
> >> >>> >> they are
> >> >>> >> essential to the patch I sent this morning.
> >> >>> >>
> >> >>> >>
> >> >>> >>         M.
> >> >>> >> --
> >> >>> >> Fast, cheap, reliable. Pick two.
> >> >>> >
> >> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >> >>> > is
> >> >>> > provided a DMA-capable device in pass-through mode. The reason being
> >> >>> > bootloader/firmware typically don't enable MMU and such
> >> >>> > bootloader/firmware
> >> >>> > will programme a pass-through DMA-capable device without any flushes
> >> >>> > to
> >> >>> > guest RAM (because it has not enabled MMU).
> >> >>> >
> >> >>> > A good example of such a device would be SATA AHCI controller given
> >> >>> > to a
> >> >>> > Guest/VM as direct access (using SystemMMU) and Guest
> >> >>> > bootloader/firmware
> >> >>> > accessing this SATA AHCI controller to load kernel images from SATA
> >> >>> > disk.
> >> >>> > In this situation, we will have to hack Guest bootloader/firmware
> >> >>> > AHCI driver to
> >> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >> >>>
> >> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >> >>> curiosity: is that a made up example or something you actually have?
> >> >>>
> >> >>> Back to square one:
> >> >>> Can you please benchmark the various cache cleaning options (global at
> >> >>> startup time, per-page on S2 translation fault, and user-space)?
> >> >>>
> >> >> Eh, why is this a more valid argument than the vgic?  The device
> >> >> passthrough Stage-2 mappings would still have the Stage-2 memory
> >> >> attributes to configure that memory region as device memory.  Why is it
> >> >> relevant if the device is DMA-capable in this context?
> >> >>
> >> >> -Christoffer
> >> >
> >> > Most ARM bootloader/firmware run with MMU off hence, they will not do
> >> > explicit cache flushes when programming DMA-capable device. Now If
> >> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> >> > will not be visible to DMA-capable device.
> >> >
> >> > --Anup
> >>
> >> The approach of flushing d-cache by set/way upon first run of VCPU will
> >> not work because for set/way operations ARM ARM says: "For set/way
> >> operations, and for All (entire cache) operations, the point is defined to be
> >> to the next level of caching". In other words, set/way operations work upto
> >> point of unification.
> >>
> >> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> >> at bootup time which does not work for us when L3-cache is enabled so,
> >> there is no point is adding one more __flush_dcache_all() upon first run of
> >> VCPU in KVM ARM64.
> >
> > When did anyone suggest using a cache cleaning method that doesn't apply
> > to this case.  I'm a little confused about your comment here.
> 
> Please refer last reply from Marc Z where he says:
> "Can you please benchmark the various cache cleaning options (global at
> startup time, per-page on S2 translation fault, and user-space)?".
> 
> Rather doing __flush_dcache_range() on each page in
> coherent_icache_guest_page() we could also flush entire d-cache upon
> first VCPU run. This only issue in flushing d-cache upon first VCPU run
> is that we cannot flush d-cache by set/way because as per ARM ARM
> all operations by set/way are upto PoU and not PoC. In presence of
> L3-Cache we need to flush d-cache upto PoC so we have to use
> __flush_dache_range()
> 
> >
> >>
> >> IMHO, we are left with following options:
> >> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
> >>     upon first run of VCPU
> >
> > We could do that, but we have to ensure that no other memory regions can
> > be added later.  Is this the case?  I don't remember.
> 
> Yes, memory regions being added later be problem for this option.
> 
> >
> >> 2. Implement outer-cache framework for ARM64 and flush all
> >>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
> >
> > What's the difference between (2) and (1)?
> 
> Linux ARM (i.e. 32-bit port) has a framework for having outer
> caches (i.e. caches that are not part of the CPU but exist as
> separate entity between CPU and Memory for performance
> improvement). Using this framework we can flush all CPU caches
> and outer cache.
> 

And we don't have such a framework in arm64?  But __flush_dcache_range
does nevertheless flush outer caches as well?

> >
> >> 3. Use an alternate version of flush_icache_range() which will
> >>     flush d-cache by PoC instead of PoU. We can also ensure
> >>     that coherent_icache_guest_page() function will be called
> >>     upon Stage2 prefetch aborts only.
> >>
> >
> > I'm sorry, but I'm not following this discussion.  Are we not mixing a
> > discussion along two different axis here?  As I see it there are two
> > separate issues:
> >
> >  (A) When/Where to flush the memory regions
> >  (B) How to flush the memory regions, including the hardware method and
> >      the kernel software engineering aspect.
> 
> Discussion here is about getting KVM ARM64 working in-presence
> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
> user-space typically loads images to guest RAM so, in-presence of
> huge L3-cache (few MBs). When the VCPU starts running some of the
> contents guest RAM will be still in L3-cache and VCPU runs with
> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
> see incorrect contents. To solve this problem we need to flush the
> guest RAM contents before they are accessed by first time by VCPU.
> 
ok, I'm with you that far.

But is it also not true that we need to decide between:

  A.1: Flush the entire guest RAM before running the VCPU
  A.2: Flush the pages as we fault them in

And (independently):

  B.1: Use __flush_dcache_range
  B.2: Use something else + outer cache framework for arm64
  B.3: Use flush_icache_range

Or do these all interleave somehow?  If so, I don't understand why.  Can
you help?

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 17:50                             ` Christoffer Dall
@ 2013-08-16 18:06                               ` Christoffer Dall
  2013-08-16 18:20                                 ` Anup Patel
  2013-08-16 18:11                               ` Anup Patel
  1 sibling, 1 reply; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 10:50:34AM -0700, Christoffer Dall wrote:
> On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
> > On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
> > <christoffer.dall@linaro.org> wrote:
> > > On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
> > >> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> > >> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > >> > <christoffer.dall@linaro.org> wrote:
> > >> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> > >> >>> On 2013-08-15 16:13, Anup Patel wrote:
> > >> >>> > Hi Marc,
> > >> >>> >
> > >> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> > >> >>> > wrote:
> > >> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
> > >> >>> >>>
> > >> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> > >> >>> >>> <marc.zyngier@arm.com>
> > >> >>> >>> wrote:
> > >> >>> >>>>
> > >> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> > >> >>> >>>>>
> > >> >>> >>>>>
> > >> >>> >>>>> Hi Marc,
> > >> >>> >>>>>
> > >> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> > >> >>> >>>>> <marc.zyngier@arm.com>
> > >> >>> >>>>> wrote:
> > >> >>> >>>>>>
> > >> >>> >>>>>>
> > >> >>> >>>>>> Hi Anup,
> > >> >>> >>>>>>
> > >> >>> >>>>>>
> > >> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> > >> >>> >>>>>>> <marc.zyngier@arm.com>
> > >> >>> >>>>>>> wrote:
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> Hi Pranav,
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> > >> >>> >>>>>>>>> dirty
> > >> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> > >> >>> >>>>>>>>> this,
> > >> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
> > >> >>> >>>>>>>>> guest
> > >> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
> > >> >>> >>>>>>>>> disabled.
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> > >> >>> >>>>>>>>> L3-cache.
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> > >> >>> >>>>>>>>> <pranavkumar@linaro.org>
> > >> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> > >> >>> >>>>>>>>> ---
> > >> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> > >> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
> > >> >>> >>>>>>>>>
> > >> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> > >> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> > >> >>> >>>>>>>>> index efe609c..5129038 100644
> > >> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> > >> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > >> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> > >> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> > >> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> > >> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> > >> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> > >> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
> > >> >>> >>>>>>>>> caches. */
> > >> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> > >> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> > >> >>> >>>>>>>>> ASID-tagged
> > >> >>> >>>>>>>>> VIVT
> > >> >>> >>>>>>>>> */
> > >> >>> >>>>>>>>>               /* any kind of VIPT cache */
> > >> >>> >>>>>>>>>               __flush_icache_all();
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
> > >> >>> >>>>>>>> past]
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> > >> >>> >>>>>>>> the
> > >> >>> >>>>>>>> data
> > >> >>> >>>>>>>> cache on each page mapping. It looks overkill.
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> > >> >>> >>>>>>>> kvmtools
> > >> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
> > >> >>> >>>>>>>> can do a
> > >> >>> >>>>>>>> "DC
> > >> >>> >>>>>>>> DVAC" on this region.
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> > >> >>> >>>>>>> I am
> > >> >>> >>>>>>> sure
> > >> >>> >>>>>>> other architectures don't have such cache cleaning in
> > >> >>> >>>>>>> user-space for
> > >> >>> >>>>>>> KVM.
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>>
> > >> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> > >> >>> >>>>>>>> - but
> > >> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
> > >> >>> >>>>>>>> guest
> > >> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>>
> > >> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> > >> >>> >>>>>>> VCPU was
> > >> >>> >>>>>>> our second option but current approach seemed very simple hence
> > >> >>> >>>>>>> we went for this.
> > >> >>> >>>>>>>
> > >> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
> > >> >>> >>>>>>> VCPU then
> > >> >>> >>>>>>> we should revise this patch.
> > >> >>> >>>>>>
> > >> >>> >>>>>>
> > >> >>> >>>>>>
> > >> >>> >>>>>>
> > >> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
> > >> >>> >>>>>> boot-tested
> > >> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
> > >> >>> >>>>>> your
> > >> >>> >>>>>> issue.
> > >> >>> >>>>>>
> > >> >>> >>>>>> As far as I can see, it should fix it without any additional
> > >> >>> >>>>>> flushing.
> > >> >>> >>>>>>
> > >> >>> >>>>>> Please let me know how it goes.
> > >> >>> >>>>>
> > >> >>> >>>>>
> > >> >>> >>>>>
> > >> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> > >> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> > >> >>> >>>>> Write-Allocate,
> > >> >>> >>>>> Outer Write-Back Write-Allocate memory"
> > >> >>> >>>>>
> > >> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> > >> >>> >>>>> treated as "Strongly-ordered device memory"
> > >> >>> >>>>>
> > >> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> > >> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> > >> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> > >> >>> >>>>
> > >> >>> >>>>
> > >> >>> >>>>
> > >> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> > >> >>> >>>> be
> > >> >>> >>>> accessed as device memory.
> > >> >>> >>>>
> > >> >>> >>>>
> > >> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> > >> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> > >> >>> >>>>> KVM ARM64 because we use VGIC.
> > >> >>> >>>>>
> > >> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> > >> >>> >>>>> when Stage1 MMU is off.
> > >> >>> >>>>
> > >> >>> >>>>
> > >> >>> >>>>
> > >> >>> >>>> See above. My understanding is that HCR.DC controls the default
> > >> >>> >>>> output of
> > >> >>> >>>> Stage-1, and Stage-2 overrides still apply.
> > >> >>> >>>
> > >> >>> >>>
> > >> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> > >> >>> >>> and wanted guest to decide the memory attribute. In other words,
> > >> >>> >>> you
> > >> >>> >>> did not want to enforce any memory attribute in Stage2.
> > >> >>> >>>
> > >> >>> >>> Please refer to this patch
> > >> >>> >>> https://patchwork.kernel.org/patch/2543201/
> > >> >>> >>
> > >> >>> >>
> > >> >>> >> This patch has never been merged. If you carry on following the
> > >> >>> >> discussion,
> > >> >>> >> you will certainly notice it was dropped for a very good reason:
> > >> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> > >> >>> >>
> > >> >>> >> So Stage-2 memory attributes are used, they are not going away, and
> > >> >>> >> they are
> > >> >>> >> essential to the patch I sent this morning.
> > >> >>> >>
> > >> >>> >>
> > >> >>> >>         M.
> > >> >>> >> --
> > >> >>> >> Fast, cheap, reliable. Pick two.
> > >> >>> >
> > >> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> > >> >>> > is
> > >> >>> > provided a DMA-capable device in pass-through mode. The reason being
> > >> >>> > bootloader/firmware typically don't enable MMU and such
> > >> >>> > bootloader/firmware
> > >> >>> > will programme a pass-through DMA-capable device without any flushes
> > >> >>> > to
> > >> >>> > guest RAM (because it has not enabled MMU).
> > >> >>> >
> > >> >>> > A good example of such a device would be SATA AHCI controller given
> > >> >>> > to a
> > >> >>> > Guest/VM as direct access (using SystemMMU) and Guest
> > >> >>> > bootloader/firmware
> > >> >>> > accessing this SATA AHCI controller to load kernel images from SATA
> > >> >>> > disk.
> > >> >>> > In this situation, we will have to hack Guest bootloader/firmware
> > >> >>> > AHCI driver to
> > >> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> > >> >>>
> > >> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> > >> >>> curiosity: is that a made up example or something you actually have?
> > >> >>>
> > >> >>> Back to square one:
> > >> >>> Can you please benchmark the various cache cleaning options (global at
> > >> >>> startup time, per-page on S2 translation fault, and user-space)?
> > >> >>>
> > >> >> Eh, why is this a more valid argument than the vgic?  The device
> > >> >> passthrough Stage-2 mappings would still have the Stage-2 memory
> > >> >> attributes to configure that memory region as device memory.  Why is it
> > >> >> relevant if the device is DMA-capable in this context?
> > >> >>
> > >> >> -Christoffer
> > >> >
> > >> > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > >> > explicit cache flushes when programming DMA-capable device. Now If
> > >> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > >> > will not be visible to DMA-capable device.
> > >> >
> > >> > --Anup
> > >>
> > >> The approach of flushing d-cache by set/way upon first run of VCPU will
> > >> not work because for set/way operations ARM ARM says: "For set/way
> > >> operations, and for All (entire cache) operations, the point is defined to be
> > >> to the next level of caching". In other words, set/way operations work upto
> > >> point of unification.
> > >>
> > >> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> > >> at bootup time which does not work for us when L3-cache is enabled so,
> > >> there is no point is adding one more __flush_dcache_all() upon first run of
> > >> VCPU in KVM ARM64.
> > >
> > > When did anyone suggest using a cache cleaning method that doesn't apply
> > > to this case.  I'm a little confused about your comment here.
> > 
> > Please refer last reply from Marc Z where he says:
> > "Can you please benchmark the various cache cleaning options (global at
> > startup time, per-page on S2 translation fault, and user-space)?".
> > 
> > Rather doing __flush_dcache_range() on each page in
> > coherent_icache_guest_page() we could also flush entire d-cache upon
> > first VCPU run. This only issue in flushing d-cache upon first VCPU run
> > is that we cannot flush d-cache by set/way because as per ARM ARM
> > all operations by set/way are upto PoU and not PoC. In presence of
> > L3-Cache we need to flush d-cache upto PoC so we have to use
> > __flush_dache_range()
> > 
> > >
> > >>
> > >> IMHO, we are left with following options:
> > >> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
> > >>     upon first run of VCPU
> > >
> > > We could do that, but we have to ensure that no other memory regions can
> > > be added later.  Is this the case?  I don't remember.
> > 
> > Yes, memory regions being added later be problem for this option.
> > 
> > >
> > >> 2. Implement outer-cache framework for ARM64 and flush all
> > >>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
> > >
> > > What's the difference between (2) and (1)?
> > 
> > Linux ARM (i.e. 32-bit port) has a framework for having outer
> > caches (i.e. caches that are not part of the CPU but exist as
> > separate entity between CPU and Memory for performance
> > improvement). Using this framework we can flush all CPU caches
> > and outer cache.
> > 
> 
> And we don't have such a framework in arm64?  But __flush_dcache_range
> does nevertheless flush outer caches as well?
> 
> > >
> > >> 3. Use an alternate version of flush_icache_range() which will
> > >>     flush d-cache by PoC instead of PoU. We can also ensure
> > >>     that coherent_icache_guest_page() function will be called
> > >>     upon Stage2 prefetch aborts only.
> > >>
> > >
> > > I'm sorry, but I'm not following this discussion.  Are we not mixing a
> > > discussion along two different axis here?  As I see it there are two
> > > separate issues:
> > >
> > >  (A) When/Where to flush the memory regions
> > >  (B) How to flush the memory regions, including the hardware method and
> > >      the kernel software engineering aspect.
> > 
> > Discussion here is about getting KVM ARM64 working in-presence
> > of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
> > user-space typically loads images to guest RAM so, in-presence of
> > huge L3-cache (few MBs). When the VCPU starts running some of the
> > contents guest RAM will be still in L3-cache and VCPU runs with
> > MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
> > see incorrect contents. To solve this problem we need to flush the
> > guest RAM contents before they are accessed by first time by VCPU.
> > 
> ok, I'm with you that far.
> 
> But is it also not true that we need to decide between:
> 
>   A.1: Flush the entire guest RAM before running the VCPU
>   A.2: Flush the pages as we fault them in
> 
> And (independently):
> 
>   B.1: Use __flush_dcache_range
>   B.2: Use something else + outer cache framework for arm64
>   B.3: Use flush_icache_range
> 
> Or do these all interleave somehow?  If so, I don't understand why.  Can
> you help?
> 
Oh, I think I understand your point now.  You mean if we use
flush_cache_all before we run the vcpu, then it's not sufficient?  I
assumed we would simply be using __flush_dcache_area for the guest RAM
regions which would flush to PoC.

For the record, I think we need a solution that also covers the case
where a memory region is registered later, and I therefore prefer having
this functionality in the stage-2 fault handler, where we are already
taking care of similar issues (like your patch suggested).

Especially if we can limit ourselves to doing so when the guest MMU is
disabled, then I really think this is going to be the least overhead and
measuring the performance of this penalty vs. at first CPU execution is
a bit overkill IMHO, since we are only talking about boot time for any
reasonable guest (which would run with the MMU enabled for real
workloads presumeably).

The only caveat is the previously discussed issue if user space loads
code after the first VCPU execution, and only user space would know if
that happens, which would argue for user space doing the cleaning...
Hmmm.

I also still have my worries about swapping, since user space is free to
map guest RAM as non-executable.

Did I miss anything here?

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 17:50                             ` Christoffer Dall
  2013-08-16 18:06                               ` Christoffer Dall
@ 2013-08-16 18:11                               ` Anup Patel
  2013-08-16 18:20                                 ` Christoffer Dall
  2013-08-30  9:52                                 ` Catalin Marinas
  1 sibling, 2 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-16 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
>> On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
>> >> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
>> >> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
>> >> > <christoffer.dall@linaro.org> wrote:
>> >> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>> >> >>> On 2013-08-15 16:13, Anup Patel wrote:
>> >> >>> > Hi Marc,
>> >> >>> >
>> >> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>> >> >>> > wrote:
>> >> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
>> >> >>> >>>
>> >> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>> >> >>> >>> <marc.zyngier@arm.com>
>> >> >>> >>> wrote:
>> >> >>> >>>>
>> >> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>> >> >>> >>>>>
>> >> >>> >>>>>
>> >> >>> >>>>> Hi Marc,
>> >> >>> >>>>>
>> >> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>> >> >>> >>>>> <marc.zyngier@arm.com>
>> >> >>> >>>>> wrote:
>> >> >>> >>>>>>
>> >> >>> >>>>>>
>> >> >>> >>>>>> Hi Anup,
>> >> >>> >>>>>>
>> >> >>> >>>>>>
>> >> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>> >> >>> >>>>>>> <marc.zyngier@arm.com>
>> >> >>> >>>>>>> wrote:
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> Hi Pranav,
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>> >> >>> >>>>>>>>> dirty
>> >> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>> >> >>> >>>>>>>>> this,
>> >> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>> >> >>> >>>>>>>>> guest
>> >> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>> >> >>> >>>>>>>>> disabled.
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>> >> >>> >>>>>>>>> L3-cache.
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>> >> >>> >>>>>>>>> <pranavkumar@linaro.org>
>> >> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> >> >>> >>>>>>>>> ---
>> >> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>> >> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
>> >> >>> >>>>>>>>>
>> >> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> >> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>> >> >>> >>>>>>>>> index efe609c..5129038 100644
>> >> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> >> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> >> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>> >> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>> >> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>> >> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>> >> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> >> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
>> >> >>> >>>>>>>>> caches. */
>> >> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>> >> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>> >> >>> >>>>>>>>> ASID-tagged
>> >> >>> >>>>>>>>> VIVT
>> >> >>> >>>>>>>>> */
>> >> >>> >>>>>>>>>               /* any kind of VIPT cache */
>> >> >>> >>>>>>>>>               __flush_icache_all();
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>> >> >>> >>>>>>>> past]
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>> >> >>> >>>>>>>> the
>> >> >>> >>>>>>>> data
>> >> >>> >>>>>>>> cache on each page mapping. It looks overkill.
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>> >> >>> >>>>>>>> kvmtools
>> >> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
>> >> >>> >>>>>>>> can do a
>> >> >>> >>>>>>>> "DC
>> >> >>> >>>>>>>> DVAC" on this region.
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>> >> >>> >>>>>>> I am
>> >> >>> >>>>>>> sure
>> >> >>> >>>>>>> other architectures don't have such cache cleaning in
>> >> >>> >>>>>>> user-space for
>> >> >>> >>>>>>> KVM.
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>>
>> >> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>> >> >>> >>>>>>>> - but
>> >> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
>> >> >>> >>>>>>>> guest
>> >> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>>
>> >> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>> >> >>> >>>>>>> VCPU was
>> >> >>> >>>>>>> our second option but current approach seemed very simple hence
>> >> >>> >>>>>>> we went for this.
>> >> >>> >>>>>>>
>> >> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
>> >> >>> >>>>>>> VCPU then
>> >> >>> >>>>>>> we should revise this patch.
>> >> >>> >>>>>>
>> >> >>> >>>>>>
>> >> >>> >>>>>>
>> >> >>> >>>>>>
>> >> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>> >> >>> >>>>>> boot-tested
>> >> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
>> >> >>> >>>>>> your
>> >> >>> >>>>>> issue.
>> >> >>> >>>>>>
>> >> >>> >>>>>> As far as I can see, it should fix it without any additional
>> >> >>> >>>>>> flushing.
>> >> >>> >>>>>>
>> >> >>> >>>>>> Please let me know how it goes.
>> >> >>> >>>>>
>> >> >>> >>>>>
>> >> >>> >>>>>
>> >> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> >> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>> >> >>> >>>>> Write-Allocate,
>> >> >>> >>>>> Outer Write-Back Write-Allocate memory"
>> >> >>> >>>>>
>> >> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> >> >>> >>>>> treated as "Strongly-ordered device memory"
>> >> >>> >>>>>
>> >> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>> >> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>> >> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>> >> >>> >>>> be
>> >> >>> >>>> accessed as device memory.
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>> >> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>> >> >>> >>>>> KVM ARM64 because we use VGIC.
>> >> >>> >>>>>
>> >> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> >> >>> >>>>> when Stage1 MMU is off.
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>> See above. My understanding is that HCR.DC controls the default
>> >> >>> >>>> output of
>> >> >>> >>>> Stage-1, and Stage-2 overrides still apply.
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> >> >>> >>> and wanted guest to decide the memory attribute. In other words,
>> >> >>> >>> you
>> >> >>> >>> did not want to enforce any memory attribute in Stage2.
>> >> >>> >>>
>> >> >>> >>> Please refer to this patch
>> >> >>> >>> https://patchwork.kernel.org/patch/2543201/
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> This patch has never been merged. If you carry on following the
>> >> >>> >> discussion,
>> >> >>> >> you will certainly notice it was dropped for a very good reason:
>> >> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>> >> >>> >>
>> >> >>> >> So Stage-2 memory attributes are used, they are not going away, and
>> >> >>> >> they are
>> >> >>> >> essential to the patch I sent this morning.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>         M.
>> >> >>> >> --
>> >> >>> >> Fast, cheap, reliable. Pick two.
>> >> >>> >
>> >> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>> >> >>> > is
>> >> >>> > provided a DMA-capable device in pass-through mode. The reason being
>> >> >>> > bootloader/firmware typically don't enable MMU and such
>> >> >>> > bootloader/firmware
>> >> >>> > will programme a pass-through DMA-capable device without any flushes
>> >> >>> > to
>> >> >>> > guest RAM (because it has not enabled MMU).
>> >> >>> >
>> >> >>> > A good example of such a device would be SATA AHCI controller given
>> >> >>> > to a
>> >> >>> > Guest/VM as direct access (using SystemMMU) and Guest
>> >> >>> > bootloader/firmware
>> >> >>> > accessing this SATA AHCI controller to load kernel images from SATA
>> >> >>> > disk.
>> >> >>> > In this situation, we will have to hack Guest bootloader/firmware
>> >> >>> > AHCI driver to
>> >> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>> >> >>>
>> >> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>> >> >>> curiosity: is that a made up example or something you actually have?
>> >> >>>
>> >> >>> Back to square one:
>> >> >>> Can you please benchmark the various cache cleaning options (global at
>> >> >>> startup time, per-page on S2 translation fault, and user-space)?
>> >> >>>
>> >> >> Eh, why is this a more valid argument than the vgic?  The device
>> >> >> passthrough Stage-2 mappings would still have the Stage-2 memory
>> >> >> attributes to configure that memory region as device memory.  Why is it
>> >> >> relevant if the device is DMA-capable in this context?
>> >> >>
>> >> >> -Christoffer
>> >> >
>> >> > Most ARM bootloader/firmware run with MMU off hence, they will not do
>> >> > explicit cache flushes when programming DMA-capable device. Now If
>> >> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
>> >> > will not be visible to DMA-capable device.
>> >> >
>> >> > --Anup
>> >>
>> >> The approach of flushing d-cache by set/way upon first run of VCPU will
>> >> not work because for set/way operations ARM ARM says: "For set/way
>> >> operations, and for All (entire cache) operations, the point is defined to be
>> >> to the next level of caching". In other words, set/way operations work upto
>> >> point of unification.
>> >>
>> >> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
>> >> at bootup time which does not work for us when L3-cache is enabled so,
>> >> there is no point is adding one more __flush_dcache_all() upon first run of
>> >> VCPU in KVM ARM64.
>> >
>> > When did anyone suggest using a cache cleaning method that doesn't apply
>> > to this case.  I'm a little confused about your comment here.
>>
>> Please refer last reply from Marc Z where he says:
>> "Can you please benchmark the various cache cleaning options (global at
>> startup time, per-page on S2 translation fault, and user-space)?".
>>
>> Rather doing __flush_dcache_range() on each page in
>> coherent_icache_guest_page() we could also flush entire d-cache upon
>> first VCPU run. This only issue in flushing d-cache upon first VCPU run
>> is that we cannot flush d-cache by set/way because as per ARM ARM
>> all operations by set/way are upto PoU and not PoC. In presence of
>> L3-Cache we need to flush d-cache upto PoC so we have to use
>> __flush_dache_range()
>>
>> >
>> >>
>> >> IMHO, we are left with following options:
>> >> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>> >>     upon first run of VCPU
>> >
>> > We could do that, but we have to ensure that no other memory regions can
>> > be added later.  Is this the case?  I don't remember.
>>
>> Yes, memory regions being added later be problem for this option.
>>
>> >
>> >> 2. Implement outer-cache framework for ARM64 and flush all
>> >>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
>> >
>> > What's the difference between (2) and (1)?
>>
>> Linux ARM (i.e. 32-bit port) has a framework for having outer
>> caches (i.e. caches that are not part of the CPU but exist as
>> separate entity between CPU and Memory for performance
>> improvement). Using this framework we can flush all CPU caches
>> and outer cache.
>>
>
> And we don't have such a framework in arm64?  But __flush_dcache_range

Yes, for now we don't have outer-cache framework in arm64.

> does nevertheless flush outer caches as well?

Yes, __flush_dcache_range() flushes the outer cache too.

The __flush_dcache_range() works for us because it flushes (i.e.
clean & invalidate) by VA upto PoC. All operation upto PoC on a
cache line will force eviction of the cache line from all CPU caches
(i.e. both L1 & L2) and also from external L3-cache to ensure that
RAM has updated contents of cache line.

Now, the outer caches (such as L3-cache) typically provide a
mechanism to flush entire L3-cache using L3-cache registers. If
we have an outer-cache framework then we can flush all caches
using __flush_dcache_all() + outer cache flush all.

>
>> >
>> >> 3. Use an alternate version of flush_icache_range() which will
>> >>     flush d-cache by PoC instead of PoU. We can also ensure
>> >>     that coherent_icache_guest_page() function will be called
>> >>     upon Stage2 prefetch aborts only.
>> >>
>> >
>> > I'm sorry, but I'm not following this discussion.  Are we not mixing a
>> > discussion along two different axis here?  As I see it there are two
>> > separate issues:
>> >
>> >  (A) When/Where to flush the memory regions
>> >  (B) How to flush the memory regions, including the hardware method and
>> >      the kernel software engineering aspect.
>>
>> Discussion here is about getting KVM ARM64 working in-presence
>> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
>> user-space typically loads images to guest RAM so, in-presence of
>> huge L3-cache (few MBs). When the VCPU starts running some of the
>> contents guest RAM will be still in L3-cache and VCPU runs with
>> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
>> see incorrect contents. To solve this problem we need to flush the
>> guest RAM contents before they are accessed by first time by VCPU.
>>
> ok, I'm with you that far.
>
> But is it also not true that we need to decide between:
>
>   A.1: Flush the entire guest RAM before running the VCPU
>   A.2: Flush the pages as we fault them in

Yes, thats the decision we have to make.

>
> And (independently):
>
>   B.1: Use __flush_dcache_range
>   B.2: Use something else + outer cache framework for arm64

This would be __flush_dcache_all() + outer cache flush all.

>   B.3: Use flush_icache_range

Actually modified version of flush_icache_range() which uses
"dc cvac" and not "dc cvau".

>
> Or do these all interleave somehow?  If so, I don't understand why.  Can
> you help?
>
> -Christoffer

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 18:11                               ` Anup Patel
@ 2013-08-16 18:20                                 ` Christoffer Dall
  2013-08-30  9:52                                 ` Catalin Marinas
  1 sibling, 0 replies; 58+ messages in thread
From: Christoffer Dall @ 2013-08-16 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 11:41:51PM +0530, Anup Patel wrote:
> On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
> >> On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
> >> >> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> >> >> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> >> >> > <christoffer.dall@linaro.org> wrote:
> >> >> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >> >> >>> On 2013-08-15 16:13, Anup Patel wrote:
> >> >> >>> > Hi Marc,
> >> >> >>> >
> >> >> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> >> >> >>> > wrote:
> >> >> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
> >> >> >>> >>>
> >> >> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >> >> >>> >>> <marc.zyngier@arm.com>
> >> >> >>> >>> wrote:
> >> >> >>> >>>>
> >> >> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >> >> >>> >>>>>
> >> >> >>> >>>>>
> >> >> >>> >>>>> Hi Marc,
> >> >> >>> >>>>>
> >> >> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >> >> >>> >>>>> <marc.zyngier@arm.com>
> >> >> >>> >>>>> wrote:
> >> >> >>> >>>>>>
> >> >> >>> >>>>>>
> >> >> >>> >>>>>> Hi Anup,
> >> >> >>> >>>>>>
> >> >> >>> >>>>>>
> >> >> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >> >> >>> >>>>>>> <marc.zyngier@arm.com>
> >> >> >>> >>>>>>> wrote:
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> Hi Pranav,
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >> >> >>> >>>>>>>>> dirty
> >> >> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >> >> >>> >>>>>>>>> this,
> >> >> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >> >> >>> >>>>>>>>> guest
> >> >> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >> >> >>> >>>>>>>>> disabled.
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >> >> >>> >>>>>>>>> L3-cache.
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >> >> >>> >>>>>>>>> <pranavkumar@linaro.org>
> >> >> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >> >> >>> >>>>>>>>> ---
> >> >> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >> >> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
> >> >> >>> >>>>>>>>>
> >> >> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >> >> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >> >> >>> >>>>>>>>> index efe609c..5129038 100644
> >> >> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> >> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> >> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >> >> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >> >> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >> >> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >> >> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >> >> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
> >> >> >>> >>>>>>>>> caches. */
> >> >> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >> >> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >> >> >>> >>>>>>>>> ASID-tagged
> >> >> >>> >>>>>>>>> VIVT
> >> >> >>> >>>>>>>>> */
> >> >> >>> >>>>>>>>>               /* any kind of VIPT cache */
> >> >> >>> >>>>>>>>>               __flush_icache_all();
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >> >> >>> >>>>>>>> past]
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >> >> >>> >>>>>>>> the
> >> >> >>> >>>>>>>> data
> >> >> >>> >>>>>>>> cache on each page mapping. It looks overkill.
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >> >> >>> >>>>>>>> kvmtools
> >> >> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
> >> >> >>> >>>>>>>> can do a
> >> >> >>> >>>>>>>> "DC
> >> >> >>> >>>>>>>> DVAC" on this region.
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >> >> >>> >>>>>>> I am
> >> >> >>> >>>>>>> sure
> >> >> >>> >>>>>>> other architectures don't have such cache cleaning in
> >> >> >>> >>>>>>> user-space for
> >> >> >>> >>>>>>> KVM.
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>>
> >> >> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >> >> >>> >>>>>>>> - but
> >> >> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
> >> >> >>> >>>>>>>> guest
> >> >> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >> >> >>> >>>>>>> VCPU was
> >> >> >>> >>>>>>> our second option but current approach seemed very simple hence
> >> >> >>> >>>>>>> we went for this.
> >> >> >>> >>>>>>>
> >> >> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
> >> >> >>> >>>>>>> VCPU then
> >> >> >>> >>>>>>> we should revise this patch.
> >> >> >>> >>>>>>
> >> >> >>> >>>>>>
> >> >> >>> >>>>>>
> >> >> >>> >>>>>>
> >> >> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >> >> >>> >>>>>> boot-tested
> >> >> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
> >> >> >>> >>>>>> your
> >> >> >>> >>>>>> issue.
> >> >> >>> >>>>>>
> >> >> >>> >>>>>> As far as I can see, it should fix it without any additional
> >> >> >>> >>>>>> flushing.
> >> >> >>> >>>>>>
> >> >> >>> >>>>>> Please let me know how it goes.
> >> >> >>> >>>>>
> >> >> >>> >>>>>
> >> >> >>> >>>>>
> >> >> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >> >> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >> >> >>> >>>>> Write-Allocate,
> >> >> >>> >>>>> Outer Write-Back Write-Allocate memory"
> >> >> >>> >>>>>
> >> >> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >> >> >>> >>>>> treated as "Strongly-ordered device memory"
> >> >> >>> >>>>>
> >> >> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >> >> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >> >> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >> >> >>> >>>>
> >> >> >>> >>>>
> >> >> >>> >>>>
> >> >> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >> >> >>> >>>> be
> >> >> >>> >>>> accessed as device memory.
> >> >> >>> >>>>
> >> >> >>> >>>>
> >> >> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >> >> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >> >> >>> >>>>> KVM ARM64 because we use VGIC.
> >> >> >>> >>>>>
> >> >> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >> >> >>> >>>>> when Stage1 MMU is off.
> >> >> >>> >>>>
> >> >> >>> >>>>
> >> >> >>> >>>>
> >> >> >>> >>>> See above. My understanding is that HCR.DC controls the default
> >> >> >>> >>>> output of
> >> >> >>> >>>> Stage-1, and Stage-2 overrides still apply.
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >> >> >>> >>> and wanted guest to decide the memory attribute. In other words,
> >> >> >>> >>> you
> >> >> >>> >>> did not want to enforce any memory attribute in Stage2.
> >> >> >>> >>>
> >> >> >>> >>> Please refer to this patch
> >> >> >>> >>> https://patchwork.kernel.org/patch/2543201/
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> This patch has never been merged. If you carry on following the
> >> >> >>> >> discussion,
> >> >> >>> >> you will certainly notice it was dropped for a very good reason:
> >> >> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >> >> >>> >>
> >> >> >>> >> So Stage-2 memory attributes are used, they are not going away, and
> >> >> >>> >> they are
> >> >> >>> >> essential to the patch I sent this morning.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>         M.
> >> >> >>> >> --
> >> >> >>> >> Fast, cheap, reliable. Pick two.
> >> >> >>> >
> >> >> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >> >> >>> > is
> >> >> >>> > provided a DMA-capable device in pass-through mode. The reason being
> >> >> >>> > bootloader/firmware typically don't enable MMU and such
> >> >> >>> > bootloader/firmware
> >> >> >>> > will programme a pass-through DMA-capable device without any flushes
> >> >> >>> > to
> >> >> >>> > guest RAM (because it has not enabled MMU).
> >> >> >>> >
> >> >> >>> > A good example of such a device would be SATA AHCI controller given
> >> >> >>> > to a
> >> >> >>> > Guest/VM as direct access (using SystemMMU) and Guest
> >> >> >>> > bootloader/firmware
> >> >> >>> > accessing this SATA AHCI controller to load kernel images from SATA
> >> >> >>> > disk.
> >> >> >>> > In this situation, we will have to hack Guest bootloader/firmware
> >> >> >>> > AHCI driver to
> >> >> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >> >> >>>
> >> >> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >> >> >>> curiosity: is that a made up example or something you actually have?
> >> >> >>>
> >> >> >>> Back to square one:
> >> >> >>> Can you please benchmark the various cache cleaning options (global at
> >> >> >>> startup time, per-page on S2 translation fault, and user-space)?
> >> >> >>>
> >> >> >> Eh, why is this a more valid argument than the vgic?  The device
> >> >> >> passthrough Stage-2 mappings would still have the Stage-2 memory
> >> >> >> attributes to configure that memory region as device memory.  Why is it
> >> >> >> relevant if the device is DMA-capable in this context?
> >> >> >>
> >> >> >> -Christoffer
> >> >> >
> >> >> > Most ARM bootloader/firmware run with MMU off hence, they will not do
> >> >> > explicit cache flushes when programming DMA-capable device. Now If
> >> >> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> >> >> > will not be visible to DMA-capable device.
> >> >> >
> >> >> > --Anup
> >> >>
> >> >> The approach of flushing d-cache by set/way upon first run of VCPU will
> >> >> not work because for set/way operations ARM ARM says: "For set/way
> >> >> operations, and for All (entire cache) operations, the point is defined to be
> >> >> to the next level of caching". In other words, set/way operations work upto
> >> >> point of unification.
> >> >>
> >> >> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> >> >> at bootup time which does not work for us when L3-cache is enabled so,
> >> >> there is no point is adding one more __flush_dcache_all() upon first run of
> >> >> VCPU in KVM ARM64.
> >> >
> >> > When did anyone suggest using a cache cleaning method that doesn't apply
> >> > to this case.  I'm a little confused about your comment here.
> >>
> >> Please refer last reply from Marc Z where he says:
> >> "Can you please benchmark the various cache cleaning options (global at
> >> startup time, per-page on S2 translation fault, and user-space)?".
> >>
> >> Rather doing __flush_dcache_range() on each page in
> >> coherent_icache_guest_page() we could also flush entire d-cache upon
> >> first VCPU run. This only issue in flushing d-cache upon first VCPU run
> >> is that we cannot flush d-cache by set/way because as per ARM ARM
> >> all operations by set/way are upto PoU and not PoC. In presence of
> >> L3-Cache we need to flush d-cache upto PoC so we have to use
> >> __flush_dache_range()
> >>
> >> >
> >> >>
> >> >> IMHO, we are left with following options:
> >> >> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
> >> >>     upon first run of VCPU
> >> >
> >> > We could do that, but we have to ensure that no other memory regions can
> >> > be added later.  Is this the case?  I don't remember.
> >>
> >> Yes, memory regions being added later be problem for this option.
> >>
> >> >
> >> >> 2. Implement outer-cache framework for ARM64 and flush all
> >> >>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
> >> >
> >> > What's the difference between (2) and (1)?
> >>
> >> Linux ARM (i.e. 32-bit port) has a framework for having outer
> >> caches (i.e. caches that are not part of the CPU but exist as
> >> separate entity between CPU and Memory for performance
> >> improvement). Using this framework we can flush all CPU caches
> >> and outer cache.
> >>
> >
> > And we don't have such a framework in arm64?  But __flush_dcache_range
> 
> Yes, for now we don't have outer-cache framework in arm64.
> 
> > does nevertheless flush outer caches as well?
> 
> Yes, __flush_dcache_range() flushes the outer cache too.
> 
> The __flush_dcache_range() works for us because it flushes (i.e.
> clean & invalidate) by VA upto PoC. All operation upto PoC on a
> cache line will force eviction of the cache line from all CPU caches
> (i.e. both L1 & L2) and also from external L3-cache to ensure that
> RAM has updated contents of cache line.
> 
> Now, the outer caches (such as L3-cache) typically provide a
> mechanism to flush entire L3-cache using L3-cache registers. If
> we have an outer-cache framework then we can flush all caches
> using __flush_dcache_all() + outer cache flush all.
> 
> >
> >> >
> >> >> 3. Use an alternate version of flush_icache_range() which will
> >> >>     flush d-cache by PoC instead of PoU. We can also ensure
> >> >>     that coherent_icache_guest_page() function will be called
> >> >>     upon Stage2 prefetch aborts only.
> >> >>
> >> >
> >> > I'm sorry, but I'm not following this discussion.  Are we not mixing a
> >> > discussion along two different axis here?  As I see it there are two
> >> > separate issues:
> >> >
> >> >  (A) When/Where to flush the memory regions
> >> >  (B) How to flush the memory regions, including the hardware method and
> >> >      the kernel software engineering aspect.
> >>
> >> Discussion here is about getting KVM ARM64 working in-presence
> >> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
> >> user-space typically loads images to guest RAM so, in-presence of
> >> huge L3-cache (few MBs). When the VCPU starts running some of the
> >> contents guest RAM will be still in L3-cache and VCPU runs with
> >> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
> >> see incorrect contents. To solve this problem we need to flush the
> >> guest RAM contents before they are accessed by first time by VCPU.
> >>
> > ok, I'm with you that far.
> >
> > But is it also not true that we need to decide between:
> >
> >   A.1: Flush the entire guest RAM before running the VCPU
> >   A.2: Flush the pages as we fault them in
> 
> Yes, thats the decision we have to make.
> 
> >
> > And (independently):
> >
> >   B.1: Use __flush_dcache_range
> >   B.2: Use something else + outer cache framework for arm64
> 
> This would be __flush_dcache_all() + outer cache flush all.
> 
> >   B.3: Use flush_icache_range
> 
> Actually modified version of flush_icache_range() which uses
> "dc cvac" and not "dc cvau".
> 
> >
> > Or do these all interleave somehow?  If so, I don't understand why.  Can
> > you help?
> >

Hi Anup,

Thanks for the explanation, it's crystal now.

-Christoffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 18:06                               ` Christoffer Dall
@ 2013-08-16 18:20                                 ` Anup Patel
  0 siblings, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-16 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 11:36 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Fri, Aug 16, 2013 at 10:50:34AM -0700, Christoffer Dall wrote:
>> On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
>> > On Fri, Aug 16, 2013 at 10:49 PM, Christoffer Dall
>> > <christoffer.dall@linaro.org> wrote:
>> > > On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
>> > >> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
>> > >> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
>> > >> > <christoffer.dall@linaro.org> wrote:
>> > >> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>> > >> >>> On 2013-08-15 16:13, Anup Patel wrote:
>> > >> >>> > Hi Marc,
>> > >> >>> >
>> > >> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
>> > >> >>> > wrote:
>> > >> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
>> > >> >>> >>>
>> > >> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>> > >> >>> >>> <marc.zyngier@arm.com>
>> > >> >>> >>> wrote:
>> > >> >>> >>>>
>> > >> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>> > >> >>> >>>>>
>> > >> >>> >>>>>
>> > >> >>> >>>>> Hi Marc,
>> > >> >>> >>>>>
>> > >> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>> > >> >>> >>>>> <marc.zyngier@arm.com>
>> > >> >>> >>>>> wrote:
>> > >> >>> >>>>>>
>> > >> >>> >>>>>>
>> > >> >>> >>>>>> Hi Anup,
>> > >> >>> >>>>>>
>> > >> >>> >>>>>>
>> > >> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>> > >> >>> >>>>>>> <marc.zyngier@arm.com>
>> > >> >>> >>>>>>> wrote:
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> Hi Pranav,
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>> > >> >>> >>>>>>>>> dirty
>> > >> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>> > >> >>> >>>>>>>>> this,
>> > >> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>> > >> >>> >>>>>>>>> guest
>> > >> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>> > >> >>> >>>>>>>>> disabled.
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>> > >> >>> >>>>>>>>> L3-cache.
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>> > >> >>> >>>>>>>>> <pranavkumar@linaro.org>
>> > >> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> > >> >>> >>>>>>>>> ---
>> > >> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>> > >> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
>> > >> >>> >>>>>>>>>
>> > >> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> > >> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>> > >> >>> >>>>>>>>> index efe609c..5129038 100644
>> > >> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> > >> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> > >> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>> > >> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>> > >> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>> > >> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>> > >> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> > >> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
>> > >> >>> >>>>>>>>> caches. */
>> > >> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>> > >> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>> > >> >>> >>>>>>>>> ASID-tagged
>> > >> >>> >>>>>>>>> VIVT
>> > >> >>> >>>>>>>>> */
>> > >> >>> >>>>>>>>>               /* any kind of VIPT cache */
>> > >> >>> >>>>>>>>>               __flush_icache_all();
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>> > >> >>> >>>>>>>> past]
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>> > >> >>> >>>>>>>> the
>> > >> >>> >>>>>>>> data
>> > >> >>> >>>>>>>> cache on each page mapping. It looks overkill.
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>> > >> >>> >>>>>>>> kvmtools
>> > >> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
>> > >> >>> >>>>>>>> can do a
>> > >> >>> >>>>>>>> "DC
>> > >> >>> >>>>>>>> DVAC" on this region.
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>> > >> >>> >>>>>>> I am
>> > >> >>> >>>>>>> sure
>> > >> >>> >>>>>>> other architectures don't have such cache cleaning in
>> > >> >>> >>>>>>> user-space for
>> > >> >>> >>>>>>> KVM.
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>>
>> > >> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>> > >> >>> >>>>>>>> - but
>> > >> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
>> > >> >>> >>>>>>>> guest
>> > >> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>> > >> >>> >>>>>>> VCPU was
>> > >> >>> >>>>>>> our second option but current approach seemed very simple hence
>> > >> >>> >>>>>>> we went for this.
>> > >> >>> >>>>>>>
>> > >> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
>> > >> >>> >>>>>>> VCPU then
>> > >> >>> >>>>>>> we should revise this patch.
>> > >> >>> >>>>>>
>> > >> >>> >>>>>>
>> > >> >>> >>>>>>
>> > >> >>> >>>>>>
>> > >> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>> > >> >>> >>>>>> boot-tested
>> > >> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
>> > >> >>> >>>>>> your
>> > >> >>> >>>>>> issue.
>> > >> >>> >>>>>>
>> > >> >>> >>>>>> As far as I can see, it should fix it without any additional
>> > >> >>> >>>>>> flushing.
>> > >> >>> >>>>>>
>> > >> >>> >>>>>> Please let me know how it goes.
>> > >> >>> >>>>>
>> > >> >>> >>>>>
>> > >> >>> >>>>>
>> > >> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> > >> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>> > >> >>> >>>>> Write-Allocate,
>> > >> >>> >>>>> Outer Write-Back Write-Allocate memory"
>> > >> >>> >>>>>
>> > >> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> > >> >>> >>>>> treated as "Strongly-ordered device memory"
>> > >> >>> >>>>>
>> > >> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>> > >> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>> > >> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>> > >> >>> >>>>
>> > >> >>> >>>>
>> > >> >>> >>>>
>> > >> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>> > >> >>> >>>> be
>> > >> >>> >>>> accessed as device memory.
>> > >> >>> >>>>
>> > >> >>> >>>>
>> > >> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>> > >> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>> > >> >>> >>>>> KVM ARM64 because we use VGIC.
>> > >> >>> >>>>>
>> > >> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> > >> >>> >>>>> when Stage1 MMU is off.
>> > >> >>> >>>>
>> > >> >>> >>>>
>> > >> >>> >>>>
>> > >> >>> >>>> See above. My understanding is that HCR.DC controls the default
>> > >> >>> >>>> output of
>> > >> >>> >>>> Stage-1, and Stage-2 overrides still apply.
>> > >> >>> >>>
>> > >> >>> >>>
>> > >> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> > >> >>> >>> and wanted guest to decide the memory attribute. In other words,
>> > >> >>> >>> you
>> > >> >>> >>> did not want to enforce any memory attribute in Stage2.
>> > >> >>> >>>
>> > >> >>> >>> Please refer to this patch
>> > >> >>> >>> https://patchwork.kernel.org/patch/2543201/
>> > >> >>> >>
>> > >> >>> >>
>> > >> >>> >> This patch has never been merged. If you carry on following the
>> > >> >>> >> discussion,
>> > >> >>> >> you will certainly notice it was dropped for a very good reason:
>> > >> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>> > >> >>> >>
>> > >> >>> >> So Stage-2 memory attributes are used, they are not going away, and
>> > >> >>> >> they are
>> > >> >>> >> essential to the patch I sent this morning.
>> > >> >>> >>
>> > >> >>> >>
>> > >> >>> >>         M.
>> > >> >>> >> --
>> > >> >>> >> Fast, cheap, reliable. Pick two.
>> > >> >>> >
>> > >> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>> > >> >>> > is
>> > >> >>> > provided a DMA-capable device in pass-through mode. The reason being
>> > >> >>> > bootloader/firmware typically don't enable MMU and such
>> > >> >>> > bootloader/firmware
>> > >> >>> > will programme a pass-through DMA-capable device without any flushes
>> > >> >>> > to
>> > >> >>> > guest RAM (because it has not enabled MMU).
>> > >> >>> >
>> > >> >>> > A good example of such a device would be SATA AHCI controller given
>> > >> >>> > to a
>> > >> >>> > Guest/VM as direct access (using SystemMMU) and Guest
>> > >> >>> > bootloader/firmware
>> > >> >>> > accessing this SATA AHCI controller to load kernel images from SATA
>> > >> >>> > disk.
>> > >> >>> > In this situation, we will have to hack Guest bootloader/firmware
>> > >> >>> > AHCI driver to
>> > >> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>> > >> >>>
>> > >> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>> > >> >>> curiosity: is that a made up example or something you actually have?
>> > >> >>>
>> > >> >>> Back to square one:
>> > >> >>> Can you please benchmark the various cache cleaning options (global at
>> > >> >>> startup time, per-page on S2 translation fault, and user-space)?
>> > >> >>>
>> > >> >> Eh, why is this a more valid argument than the vgic?  The device
>> > >> >> passthrough Stage-2 mappings would still have the Stage-2 memory
>> > >> >> attributes to configure that memory region as device memory.  Why is it
>> > >> >> relevant if the device is DMA-capable in this context?
>> > >> >>
>> > >> >> -Christoffer
>> > >> >
>> > >> > Most ARM bootloader/firmware run with MMU off hence, they will not do
>> > >> > explicit cache flushes when programming DMA-capable device. Now If
>> > >> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
>> > >> > will not be visible to DMA-capable device.
>> > >> >
>> > >> > --Anup
>> > >>
>> > >> The approach of flushing d-cache by set/way upon first run of VCPU will
>> > >> not work because for set/way operations ARM ARM says: "For set/way
>> > >> operations, and for All (entire cache) operations, the point is defined to be
>> > >> to the next level of caching". In other words, set/way operations work upto
>> > >> point of unification.
>> > >>
>> > >> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
>> > >> at bootup time which does not work for us when L3-cache is enabled so,
>> > >> there is no point is adding one more __flush_dcache_all() upon first run of
>> > >> VCPU in KVM ARM64.
>> > >
>> > > When did anyone suggest using a cache cleaning method that doesn't apply
>> > > to this case.  I'm a little confused about your comment here.
>> >
>> > Please refer last reply from Marc Z where he says:
>> > "Can you please benchmark the various cache cleaning options (global at
>> > startup time, per-page on S2 translation fault, and user-space)?".
>> >
>> > Rather doing __flush_dcache_range() on each page in
>> > coherent_icache_guest_page() we could also flush entire d-cache upon
>> > first VCPU run. This only issue in flushing d-cache upon first VCPU run
>> > is that we cannot flush d-cache by set/way because as per ARM ARM
>> > all operations by set/way are upto PoU and not PoC. In presence of
>> > L3-Cache we need to flush d-cache upto PoC so we have to use
>> > __flush_dache_range()
>> >
>> > >
>> > >>
>> > >> IMHO, we are left with following options:
>> > >> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>> > >>     upon first run of VCPU
>> > >
>> > > We could do that, but we have to ensure that no other memory regions can
>> > > be added later.  Is this the case?  I don't remember.
>> >
>> > Yes, memory regions being added later be problem for this option.
>> >
>> > >
>> > >> 2. Implement outer-cache framework for ARM64 and flush all
>> > >>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
>> > >
>> > > What's the difference between (2) and (1)?
>> >
>> > Linux ARM (i.e. 32-bit port) has a framework for having outer
>> > caches (i.e. caches that are not part of the CPU but exist as
>> > separate entity between CPU and Memory for performance
>> > improvement). Using this framework we can flush all CPU caches
>> > and outer cache.
>> >
>>
>> And we don't have such a framework in arm64?  But __flush_dcache_range
>> does nevertheless flush outer caches as well?
>>
>> > >
>> > >> 3. Use an alternate version of flush_icache_range() which will
>> > >>     flush d-cache by PoC instead of PoU. We can also ensure
>> > >>     that coherent_icache_guest_page() function will be called
>> > >>     upon Stage2 prefetch aborts only.
>> > >>
>> > >
>> > > I'm sorry, but I'm not following this discussion.  Are we not mixing a
>> > > discussion along two different axis here?  As I see it there are two
>> > > separate issues:
>> > >
>> > >  (A) When/Where to flush the memory regions
>> > >  (B) How to flush the memory regions, including the hardware method and
>> > >      the kernel software engineering aspect.
>> >
>> > Discussion here is about getting KVM ARM64 working in-presence
>> > of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
>> > user-space typically loads images to guest RAM so, in-presence of
>> > huge L3-cache (few MBs). When the VCPU starts running some of the
>> > contents guest RAM will be still in L3-cache and VCPU runs with
>> > MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
>> > see incorrect contents. To solve this problem we need to flush the
>> > guest RAM contents before they are accessed by first time by VCPU.
>> >
>> ok, I'm with you that far.
>>
>> But is it also not true that we need to decide between:
>>
>>   A.1: Flush the entire guest RAM before running the VCPU
>>   A.2: Flush the pages as we fault them in
>>
>> And (independently):
>>
>>   B.1: Use __flush_dcache_range
>>   B.2: Use something else + outer cache framework for arm64
>>   B.3: Use flush_icache_range
>>
>> Or do these all interleave somehow?  If so, I don't understand why.  Can
>> you help?
>>
> Oh, I think I understand your point now.  You mean if we use
> flush_cache_all before we run the vcpu, then it's not sufficient?  I
> assumed we would simply be using __flush_dcache_area for the guest RAM
> regions which would flush to PoC.
>
> For the record, I think we need a solution that also covers the case
> where a memory region is registered later, and I therefore prefer having
> this functionality in the stage-2 fault handler, where we are already
> taking care of similar issues (like your patch suggested).
>
> Especially if we can limit ourselves to doing so when the guest MMU is
> disabled, then I really think this is going to be the least overhead and
> measuring the performance of this penalty vs. at first CPU execution is
> a bit overkill IMHO, since we are only talking about boot time for any
> reasonable guest (which would run with the MMU enabled for real
> workloads presumeably).

Yes, currently we call coherent_icache_guest_page() upon all
Stage2 translation faults but, we can be improve a bit here just like
your suggestion. May be we can call coherent_icache_guest_page()
only when VCPU MMU is off and we get Stage2 instruction prefetch
translation fault.

>
> The only caveat is the previously discussed issue if user space loads
> code after the first VCPU execution, and only user space would know if
> that happens, which would argue for user space doing the cleaning...
> Hmmm.
>
> I also still have my worries about swapping, since user space is free to
> map guest RAM as non-executable.
>
> Did I miss anything here?

I don't know much about Linux swapping but for the rest part we are in
sync.

>
> -Christoffer

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16  6:57                       ` Anup Patel
  2013-08-16 17:19                         ` Christoffer Dall
@ 2013-08-29 10:52                         ` Catalin Marinas
  2013-08-29 12:31                           ` Anup Patel
  1 sibling, 1 reply; 58+ messages in thread
From: Catalin Marinas @ 2013-08-29 10:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anup,

Jumping late into this thread (holidays).

On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
> The approach of flushing d-cache by set/way upon first run of VCPU will
> not work because for set/way operations ARM ARM says: "For set/way
> operations, and for All (entire cache) operations, the point is defined to be
> to the next level of caching". In other words, set/way operations work upto
> point of unification.

I don't understand where you got the idea that set/way operations work
up to the point of unification. This is incorrect, the set/way
operations work on the level of cache specified by bits 3:1 in the
register passed to the DC CISW instruction. For your L3 cache, those
bits would be 2 (and __flush_dcache_all() implementation does this
dynamically).

For the I-cache all operation, that's correct, it only invalidates to
the PoU but it doesn't need to go any further, you do the D-cache
maintenance for this (no point in duplicating functionality).

> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> at bootup time which does not work for us when L3-cache is enabled so,
> there is no point is adding one more __flush_dcache_all() upon first run of
> VCPU in KVM ARM64.

Do you mean the CPU is not aware of the L3 cache? Does CLIDR_EL1 report
an L3 cache on your implementation?

It's not clear to me whether your L3 cache is inner or outer (or a mix).
You say that D-cache maintenance to PoC flushes the L3 which looks to me
like an inner cache, in which cache it should be reported in the LoC
bits in CLIDR_EL1.

> IMHO, we are left with following options:
> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>     upon first run of VCPU
> 2. Implement outer-cache framework for ARM64 and flush all
>     caches + outer cache (i.e. L3-cache) upon first run of VCPU

Do you have specific instructions for flushing the L3 cache only? It's
not clear to me what an outer-cache framework would to on AArch64. It
was added on AArch32 for the L2x0/PL310 which need separate instructions
by physical address for flushing the cache. I really hope we don't get
these again on ARMv8 hardware.

> 3. Use an alternate version of flush_icache_range() which will
>     flush d-cache by PoC instead of PoU. We can also ensure
>     that coherent_icache_guest_page() function will be called
>     upon Stage2 prefetch aborts only.

flush_icache_range() is meant to flush only to the PoU to ensure the I-D
cache coherency. I don't think we should change this.

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-29 10:52                         ` Catalin Marinas
@ 2013-08-29 12:31                           ` Anup Patel
  2013-08-29 12:53                             ` Catalin Marinas
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-29 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> Hi Anup,
>
> Jumping late into this thread (holidays).
>
> On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
>> The approach of flushing d-cache by set/way upon first run of VCPU will
>> not work because for set/way operations ARM ARM says: "For set/way
>> operations, and for All (entire cache) operations, the point is defined to be
>> to the next level of caching". In other words, set/way operations work upto
>> point of unification.
>
> I don't understand where you got the idea that set/way operations work
> up to the point of unification. This is incorrect, the set/way
> operations work on the level of cache specified by bits 3:1 in the
> register passed to the DC CISW instruction. For your L3 cache, those
> bits would be 2 (and __flush_dcache_all() implementation does this
> dynamically).

The L3-cache is not visible to CPU. It is totally independent and transparent
to CPU.

>
> For the I-cache all operation, that's correct, it only invalidates to
> the PoU but it doesn't need to go any further, you do the D-cache
> maintenance for this (no point in duplicating functionality).
>
>> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
>> at bootup time which does not work for us when L3-cache is enabled so,
>> there is no point is adding one more __flush_dcache_all() upon first run of
>> VCPU in KVM ARM64.
>
> Do you mean the CPU is not aware of the L3 cache? Does CLIDR_EL1 report
> an L3 cache on your implementation?

Yes, CPU is not aware of L3-cache and its state.

In our case, CLIDR_EL1 does not report L3-cache.

>
> It's not clear to me whether your L3 cache is inner or outer (or a mix).
> You say that D-cache maintenance to PoC flushes the L3 which looks to me
> like an inner cache, in which cache it should be reported in the LoC
> bits in CLIDR_EL1.
>
>> IMHO, we are left with following options:
>> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>>     upon first run of VCPU
>> 2. Implement outer-cache framework for ARM64 and flush all
>>     caches + outer cache (i.e. L3-cache) upon first run of VCPU
>
> Do you have specific instructions for flushing the L3 cache only? It's
> not clear to me what an outer-cache framework would to on AArch64. It
> was added on AArch32 for the L2x0/PL310 which need separate instructions
> by physical address for flushing the cache. I really hope we don't get
> these again on ARMv8 hardware.
>
>> 3. Use an alternate version of flush_icache_range() which will
>>     flush d-cache by PoC instead of PoU. We can also ensure
>>     that coherent_icache_guest_page() function will be called
>>     upon Stage2 prefetch aborts only.
>
> flush_icache_range() is meant to flush only to the PoU to ensure the I-D
> cache coherency. I don't think we should change this.

We did not mean to change flush_icache_range() instead we suggest
to have separate flush_icache_range_kvm() for KVM ARM64 which flushes
d-cache to PoC.

>
> --
> Catalin

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-29 12:31                           ` Anup Patel
@ 2013-08-29 12:53                             ` Catalin Marinas
  2013-08-29 16:02                               ` Anup Patel
  0 siblings, 1 reply; 58+ messages in thread
From: Catalin Marinas @ 2013-08-29 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
> On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
> >> The approach of flushing d-cache by set/way upon first run of VCPU will
> >> not work because for set/way operations ARM ARM says: "For set/way
> >> operations, and for All (entire cache) operations, the point is defined to be
> >> to the next level of caching". In other words, set/way operations work upto
> >> point of unification.
> >
> > I don't understand where you got the idea that set/way operations work
> > up to the point of unification. This is incorrect, the set/way
> > operations work on the level of cache specified by bits 3:1 in the
> > register passed to the DC CISW instruction. For your L3 cache, those
> > bits would be 2 (and __flush_dcache_all() implementation does this
> > dynamically).
> 
> The L3-cache is not visible to CPU. It is totally independent and transparent
> to CPU.

OK. But you say that operations like DC CIVAC actually flush the L3? So
I don't see it as completely transparent to the CPU.

Do you have any configuration bits which would make the L3 completely
transparent like always caching even when accesses are non-cacheable and
DC ops to PoC ignoring it?

Now, back to the idea of outer_cache framework for arm64. Does your CPU
have separate instructions for flushing this L3 cache?

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-29 12:53                             ` Catalin Marinas
@ 2013-08-29 16:02                               ` Anup Patel
  2013-08-30  9:44                                 ` Catalin Marinas
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-29 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 29, 2013 at 6:23 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
>> On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
>> >> The approach of flushing d-cache by set/way upon first run of VCPU will
>> >> not work because for set/way operations ARM ARM says: "For set/way
>> >> operations, and for All (entire cache) operations, the point is defined to be
>> >> to the next level of caching". In other words, set/way operations work upto
>> >> point of unification.
>> >
>> > I don't understand where you got the idea that set/way operations work
>> > up to the point of unification. This is incorrect, the set/way
>> > operations work on the level of cache specified by bits 3:1 in the
>> > register passed to the DC CISW instruction. For your L3 cache, those
>> > bits would be 2 (and __flush_dcache_all() implementation does this
>> > dynamically).
>>
>> The L3-cache is not visible to CPU. It is totally independent and transparent
>> to CPU.
>
> OK. But you say that operations like DC CIVAC actually flush the L3? So
> I don't see it as completely transparent to the CPU.

It is transparent from CPU perspective. In other words, there is nothing in
CPU for controlling/monitoring L3-cache.

>
> Do you have any configuration bits which would make the L3 completely
> transparent like always caching even when accesses are non-cacheable and
> DC ops to PoC ignoring it?

Actually, L3-cache monitors the types of read/write generated by CPU (i.e.
whether the request is cacheable/non-cacheable or whether the request is
due to DC ops to PoC, or ...).

To answer your query, there is no configuration to have L3 caching when
accesses are non-cacheable and DC ops to PoC.

>
> Now, back to the idea of outer_cache framework for arm64. Does your CPU
> have separate instructions for flushing this L3 cache?

No, CPU does not have separate instruction for flushing L3-cache. On the
other hand, L3-cache has MMIO registers which can be use to explicitly
flush L3-cache.

>
> --
> Catalin

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-29 16:02                               ` Anup Patel
@ 2013-08-30  9:44                                 ` Catalin Marinas
  2013-08-30 10:36                                   ` Anup Patel
  0 siblings, 1 reply; 58+ messages in thread
From: Catalin Marinas @ 2013-08-30  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 29, 2013 at 05:02:50PM +0100, Anup Patel wrote:
> On Thu, Aug 29, 2013 at 6:23 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
> >> On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
> >> <catalin.marinas@arm.com> wrote:
> >> > On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
> >> >> The approach of flushing d-cache by set/way upon first run of VCPU will
> >> >> not work because for set/way operations ARM ARM says: "For set/way
> >> >> operations, and for All (entire cache) operations, the point is defined to be
> >> >> to the next level of caching". In other words, set/way operations work upto
> >> >> point of unification.
> >> >
> >> > I don't understand where you got the idea that set/way operations work
> >> > up to the point of unification. This is incorrect, the set/way
> >> > operations work on the level of cache specified by bits 3:1 in the
> >> > register passed to the DC CISW instruction. For your L3 cache, those
> >> > bits would be 2 (and __flush_dcache_all() implementation does this
> >> > dynamically).
> >>
> >> The L3-cache is not visible to CPU. It is totally independent and transparent
> >> to CPU.
> >
> > OK. But you say that operations like DC CIVAC actually flush the L3? So
> > I don't see it as completely transparent to the CPU.
> 
> It is transparent from CPU perspective. In other words, there is nothing in
> CPU for controlling/monitoring L3-cache.

We probably have a different understanding of "transparent". It doesn't
look to me like any more transparent than the L1 or L2 cache. Basically,
from a software perspective, it needs maintenance. Whether the CPU
explicitly asks the L3 cache for this or the L3 cache figures it on its
own based on the L1/L2 operations is irrelevant.

It would have been transparent if the software didn't need to know about
it at all, but it's not the case.

> > Do you have any configuration bits which would make the L3 completely
> > transparent like always caching even when accesses are non-cacheable and
> > DC ops to PoC ignoring it?
> 
> Actually, L3-cache monitors the types of read/write generated by CPU (i.e.
> whether the request is cacheable/non-cacheable or whether the request is
> due to DC ops to PoC, or ...).
> 
> To answer your query, there is no configuration to have L3 caching when
> accesses are non-cacheable and DC ops to PoC.

So it's an outer cache with some "improvements" to handle DC ops to PoC.
I think it was a pretty bad decision on the hardware side as we really
try to get rid of outer caches for many reasons:

1. Non-standard cache flushing operations (MMIO-based)
2. It may require cache maintenance by physical address - something
   hard to get in a guest OS (unless you virtualise L3 cache
   maintenance)
3. Are barriers like DSB propagated correctly? Does a DC op to PoC
   followed by DSB ensure that the L3 drained the cachelines to RAM?

I think point 2 isn't required because your L3 detects DC ops to PoC. I
hope point 3 is handled correctly (otherwise look how "nice" the mb()
macro on arm is to cope with L2x0).

If only 1 is left, we don't need the full outer_cache framework but it
still needs to be addressed since the assumption is that flush_cache_all
(or __flush_dcache_all) flushes all cache levels. These are not used in
generic code but are used during kernel booting, KVM and cpuidle
drivers.

> > Now, back to the idea of outer_cache framework for arm64. Does your CPU
> > have separate instructions for flushing this L3 cache?
> 
> No, CPU does not have separate instruction for flushing L3-cache. On the
> other hand, L3-cache has MMIO registers which can be use to explicitly
> flush L3-cache.

I guess you use those in your firmware or boot loader since Linux
requires clean/invalidated caches at boot (and I plan to push a patch
which removes kernel D-cache cleaning during boot to spot such problems
early). A cpuidle driver would probably need this as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-16 18:11                               ` Anup Patel
  2013-08-16 18:20                                 ` Christoffer Dall
@ 2013-08-30  9:52                                 ` Catalin Marinas
  2013-08-30 10:44                                   ` Anup Patel
  1 sibling, 1 reply; 58+ messages in thread
From: Catalin Marinas @ 2013-08-30  9:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 16, 2013 at 07:11:51PM +0100, Anup Patel wrote:
> On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
> >> Discussion here is about getting KVM ARM64 working in-presence
> >> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
> >> user-space typically loads images to guest RAM so, in-presence of
> >> huge L3-cache (few MBs). When the VCPU starts running some of the
> >> contents guest RAM will be still in L3-cache and VCPU runs with
> >> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
> >> see incorrect contents. To solve this problem we need to flush the
> >> guest RAM contents before they are accessed by first time by VCPU.
> >>
> > ok, I'm with you that far.
> >
> > But is it also not true that we need to decide between:
> >
> >   A.1: Flush the entire guest RAM before running the VCPU
> >   A.2: Flush the pages as we fault them in
> 
> Yes, thats the decision we have to make.
> 
> >
> > And (independently):
> >
> >   B.1: Use __flush_dcache_range
> >   B.2: Use something else + outer cache framework for arm64
> 
> This would be __flush_dcache_all() + outer cache flush all.

We need to be careful here since the __flush_dcache_all() operation uses
cache maintenance by set/way and these are *local* to a CPU (IOW not
broadcast). Do you have any guarantee that dirty cache lines don't
migrate between CPUs and __flush_dcache_all() wouldn't miss them?
Architecturally we don't, so this is not a safe operation that would
guarantee L1 cache flushing (we probably need to revisit some of the
__flush_dcache_all() calls in KVM, I haven't looked into this).

So I think we are left to the range operation where the DC ops to PoC
would be enough for your L3.

An outer cache flush all is probably only needed for cpuidle/suspend
(the booting part should be handled by the boot loader).

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30  9:44                                 ` Catalin Marinas
@ 2013-08-30 10:36                                   ` Anup Patel
  2013-08-30 12:52                                     ` Catalin Marinas
  0 siblings, 1 reply; 58+ messages in thread
From: Anup Patel @ 2013-08-30 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 3:14 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, Aug 29, 2013 at 05:02:50PM +0100, Anup Patel wrote:
>> On Thu, Aug 29, 2013 at 6:23 PM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
>> >> On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
>> >> <catalin.marinas@arm.com> wrote:
>> >> > On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
>> >> >> The approach of flushing d-cache by set/way upon first run of VCPU will
>> >> >> not work because for set/way operations ARM ARM says: "For set/way
>> >> >> operations, and for All (entire cache) operations, the point is defined to be
>> >> >> to the next level of caching". In other words, set/way operations work upto
>> >> >> point of unification.
>> >> >
>> >> > I don't understand where you got the idea that set/way operations work
>> >> > up to the point of unification. This is incorrect, the set/way
>> >> > operations work on the level of cache specified by bits 3:1 in the
>> >> > register passed to the DC CISW instruction. For your L3 cache, those
>> >> > bits would be 2 (and __flush_dcache_all() implementation does this
>> >> > dynamically).
>> >>
>> >> The L3-cache is not visible to CPU. It is totally independent and transparent
>> >> to CPU.
>> >
>> > OK. But you say that operations like DC CIVAC actually flush the L3? So
>> > I don't see it as completely transparent to the CPU.
>>
>> It is transparent from CPU perspective. In other words, there is nothing in
>> CPU for controlling/monitoring L3-cache.
>
> We probably have a different understanding of "transparent". It doesn't
> look to me like any more transparent than the L1 or L2 cache. Basically,
> from a software perspective, it needs maintenance. Whether the CPU
> explicitly asks the L3 cache for this or the L3 cache figures it on its
> own based on the L1/L2 operations is irrelevant.

Yes, we have a different understanding regarding "transparent". The goal
behind our L3 is to have external cache such that CPU is not aware of it
and CPU would work fine even if L3 is turned-off on-the-fly. Further such
L3 has to be smarter than normal outer-caches because it will be
maintained automatically by observing CPU operations.

>
> It would have been transparent if the software didn't need to know about
> it at all, but it's not the case.

Currently, KVM is the only place where we need L3 maintenance because
Guest starts with MMU turned-off otherwise we don't need L3 maintenance
in kernel for any kind of IO or Subsystem.

>
>> > Do you have any configuration bits which would make the L3 completely
>> > transparent like always caching even when accesses are non-cacheable and
>> > DC ops to PoC ignoring it?
>>
>> Actually, L3-cache monitors the types of read/write generated by CPU (i.e.
>> whether the request is cacheable/non-cacheable or whether the request is
>> due to DC ops to PoC, or ...).
>>
>> To answer your query, there is no configuration to have L3 caching when
>> accesses are non-cacheable and DC ops to PoC.
>
> So it's an outer cache with some "improvements" to handle DC ops to PoC.
> I think it was a pretty bad decision on the hardware side as we really
> try to get rid of outer caches for many reasons:

Getting rid off outer-caches (such as in this context) may make sense for
Embedded/Low-end systems but for Servers L3-cache makes lot of sense.

Claiming this to be a bad decision all depends on what end application
you are looking at.

>
> 1. Non-standard cache flushing operations (MMIO-based)
> 2. It may require cache maintenance by physical address - something
>    hard to get in a guest OS (unless you virtualise L3 cache
>    maintenance)

We don't have cache maintenance by physical address in our L3-cache.

> 3. Are barriers like DSB propagated correctly? Does a DC op to PoC
>    followed by DSB ensure that the L3 drained the cachelines to RAM?

Yes, DSB works perfectly fine for us with L3.
Yes, DC op to PoC followed by DSB works fine with L3 too.

>
> I think point 2 isn't required because your L3 detects DC ops to PoC. I
> hope point 3 is handled correctly (otherwise look how "nice" the mb()
> macro on arm is to cope with L2x0).
>
> If only 1 is left, we don't need the full outer_cache framework but it
> still needs to be addressed since the assumption is that flush_cache_all
> (or __flush_dcache_all) flushes all cache levels. These are not used in
> generic code but are used during kernel booting, KVM and cpuidle
> drivers.
>
>> > Now, back to the idea of outer_cache framework for arm64. Does your CPU
>> > have separate instructions for flushing this L3 cache?
>>
>> No, CPU does not have separate instruction for flushing L3-cache. On the
>> other hand, L3-cache has MMIO registers which can be use to explicitly
>> flush L3-cache.
>
> I guess you use those in your firmware or boot loader since Linux
> requires clean/invalidated caches at boot (and I plan to push a patch
> which removes kernel D-cache cleaning during boot to spot such problems
> early). A cpuidle driver would probably need this as well.

Yes, cpuidle would be another place where we may need L3-cache
maintenance. In other words, if we need to power-off L3-cache from
kernel then we need to first flush it.

>
> --
> Catalin

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30  9:52                                 ` Catalin Marinas
@ 2013-08-30 10:44                                   ` Anup Patel
  2013-08-30 13:02                                     ` Catalin Marinas
  2013-08-30 13:21                                     ` Marc Zyngier
  0 siblings, 2 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-30 10:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 3:22 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Fri, Aug 16, 2013 at 07:11:51PM +0100, Anup Patel wrote:
>> On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
>> >> Discussion here is about getting KVM ARM64 working in-presence
>> >> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
>> >> user-space typically loads images to guest RAM so, in-presence of
>> >> huge L3-cache (few MBs). When the VCPU starts running some of the
>> >> contents guest RAM will be still in L3-cache and VCPU runs with
>> >> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
>> >> see incorrect contents. To solve this problem we need to flush the
>> >> guest RAM contents before they are accessed by first time by VCPU.
>> >>
>> > ok, I'm with you that far.
>> >
>> > But is it also not true that we need to decide between:
>> >
>> >   A.1: Flush the entire guest RAM before running the VCPU
>> >   A.2: Flush the pages as we fault them in
>>
>> Yes, thats the decision we have to make.
>>
>> >
>> > And (independently):
>> >
>> >   B.1: Use __flush_dcache_range
>> >   B.2: Use something else + outer cache framework for arm64
>>
>> This would be __flush_dcache_all() + outer cache flush all.
>
> We need to be careful here since the __flush_dcache_all() operation uses
> cache maintenance by set/way and these are *local* to a CPU (IOW not
> broadcast). Do you have any guarantee that dirty cache lines don't
> migrate between CPUs and __flush_dcache_all() wouldn't miss them?
> Architecturally we don't, so this is not a safe operation that would
> guarantee L1 cache flushing (we probably need to revisit some of the
> __flush_dcache_all() calls in KVM, I haven't looked into this).
>
> So I think we are left to the range operation where the DC ops to PoC
> would be enough for your L3.

If __flush_dcache_all() is *local" to a CPU then I guess DC ops to PoC
by range would be the only option.

>
> An outer cache flush all is probably only needed for cpuidle/suspend
> (the booting part should be handled by the boot loader).

Yes, cpuidle/suspend would definitely require outer cache maintenance.

For KVM, we can avoid flushing d-cache to PoC every time in
coherent_icache_guest_page() by only doing it when Guest MMU is
turned-off. This may reduce the performance penalty.

>
> --
> Catalin

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 10:36                                   ` Anup Patel
@ 2013-08-30 12:52                                     ` Catalin Marinas
  0 siblings, 0 replies; 58+ messages in thread
From: Catalin Marinas @ 2013-08-30 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 11:36:30AM +0100, Anup Patel wrote:
> On Fri, Aug 30, 2013 at 3:14 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Thu, Aug 29, 2013 at 05:02:50PM +0100, Anup Patel wrote:
> >> Actually, L3-cache monitors the types of read/write generated by CPU (i.e.
> >> whether the request is cacheable/non-cacheable or whether the request is
> >> due to DC ops to PoC, or ...).
> >>
> >> To answer your query, there is no configuration to have L3 caching when
> >> accesses are non-cacheable and DC ops to PoC.
> >
> > So it's an outer cache with some "improvements" to handle DC ops to PoC.
> > I think it was a pretty bad decision on the hardware side as we really
> > try to get rid of outer caches for many reasons:
> 
> Getting rid off outer-caches (such as in this context) may make sense for
> Embedded/Low-end systems but for Servers L3-cache makes lot of sense.
> 
> Claiming this to be a bad decision all depends on what end application
> you are looking at.

It's not a bad decision to have big L3 cache, that's a very *good*
decision for server applications. The bad part is that it is not fully
integrated with the CPU (like allowing set/way operations to flush this
cache level).

> > 1. Non-standard cache flushing operations (MMIO-based)
> > 2. It may require cache maintenance by physical address - something
> >    hard to get in a guest OS (unless you virtualise L3 cache
> >    maintenance)
> 
> We don't have cache maintenance by physical address in our L3-cache.

Great.

> > 3. Are barriers like DSB propagated correctly? Does a DC op to PoC
> >    followed by DSB ensure that the L3 drained the cachelines to RAM?
> 
> Yes, DSB works perfectly fine for us with L3.
> Yes, DC op to PoC followed by DSB works fine with L3 too.

Even better.

See my other comment on flush_dcache_all() use in the kernel/KVM and why
I don't think it is suitable (which leaves us with DC ops to PoC in
KVM).

> > I think point 2 isn't required because your L3 detects DC ops to PoC. I
> > hope point 3 is handled correctly (otherwise look how "nice" the mb()
> > macro on arm is to cope with L2x0).
> >
> > If only 1 is left, we don't need the full outer_cache framework but it
> > still needs to be addressed since the assumption is that flush_cache_all
> > (or __flush_dcache_all) flushes all cache levels. These are not used in
> > generic code but are used during kernel booting, KVM and cpuidle
> > drivers.
> >
> >> > Now, back to the idea of outer_cache framework for arm64. Does your CPU
> >> > have separate instructions for flushing this L3 cache?
> >>
> >> No, CPU does not have separate instruction for flushing L3-cache. On the
> >> other hand, L3-cache has MMIO registers which can be use to explicitly
> >> flush L3-cache.
> >
> > I guess you use those in your firmware or boot loader since Linux
> > requires clean/invalidated caches at boot (and I plan to push a patch
> > which removes kernel D-cache cleaning during boot to spot such problems
> > early). A cpuidle driver would probably need this as well.
> 
> Yes, cpuidle would be another place where we may need L3-cache
> maintenance. In other words, if we need to power-off L3-cache from
> kernel then we need to first flush it.

The problem is if you need to disable the MMU in the kernel, you would
need to flush the L3 cache first. Normally this would be done in the
firmware with PSCI but most likely not the case for the Applied
hardware.

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 10:44                                   ` Anup Patel
@ 2013-08-30 13:02                                     ` Catalin Marinas
  2013-08-30 13:21                                     ` Marc Zyngier
  1 sibling, 0 replies; 58+ messages in thread
From: Catalin Marinas @ 2013-08-30 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 11:44:30AM +0100, Anup Patel wrote:
> On Fri, Aug 30, 2013 at 3:22 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 16, 2013 at 07:11:51PM +0100, Anup Patel wrote:
> >> On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
> >> >> Discussion here is about getting KVM ARM64 working in-presence
> >> >> of an external L3-cache (i.e. not part of CPU). Before starting a VCPU
> >> >> user-space typically loads images to guest RAM so, in-presence of
> >> >> huge L3-cache (few MBs). When the VCPU starts running some of the
> >> >> contents guest RAM will be still in L3-cache and VCPU runs with
> >> >> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
> >> >> see incorrect contents. To solve this problem we need to flush the
> >> >> guest RAM contents before they are accessed by first time by VCPU.
> >> >>
> >> > ok, I'm with you that far.
> >> >
> >> > But is it also not true that we need to decide between:
> >> >
> >> >   A.1: Flush the entire guest RAM before running the VCPU
> >> >   A.2: Flush the pages as we fault them in
> >>
> >> Yes, thats the decision we have to make.
> >>
> >> >
> >> > And (independently):
> >> >
> >> >   B.1: Use __flush_dcache_range
> >> >   B.2: Use something else + outer cache framework for arm64
> >>
> >> This would be __flush_dcache_all() + outer cache flush all.
> >
> > We need to be careful here since the __flush_dcache_all() operation uses
> > cache maintenance by set/way and these are *local* to a CPU (IOW not
> > broadcast). Do you have any guarantee that dirty cache lines don't
> > migrate between CPUs and __flush_dcache_all() wouldn't miss them?
> > Architecturally we don't, so this is not a safe operation that would
> > guarantee L1 cache flushing (we probably need to revisit some of the
> > __flush_dcache_all() calls in KVM, I haven't looked into this).
> >
> > So I think we are left to the range operation where the DC ops to PoC
> > would be enough for your L3.
> 
> If __flush_dcache_all() is *local" to a CPU then I guess DC ops to PoC
> by range would be the only option.

Yes. In the (upcoming) ARM ARMv8 there is a clear note that set/way
operations to flush the whole cache must not be used for the maintenance
of large buffer but only during power-down/power-up code sequences.

> > An outer cache flush all is probably only needed for cpuidle/suspend
> > (the booting part should be handled by the boot loader).
> 
> Yes, cpuidle/suspend would definitely require outer cache maintenance.
> 
> For KVM, we can avoid flushing d-cache to PoC every time in
> coherent_icache_guest_page() by only doing it when Guest MMU is
> turned-off. This may reduce the performance penalty.

That's for the KVM guys to decide ;)

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 10:44                                   ` Anup Patel
  2013-08-30 13:02                                     ` Catalin Marinas
@ 2013-08-30 13:21                                     ` Marc Zyngier
  2013-08-30 14:04                                       ` Catalin Marinas
  1 sibling, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-30 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-30 11:44, Anup Patel wrote:
> On Fri, Aug 30, 2013 at 3:22 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
>> On Fri, Aug 16, 2013 at 07:11:51PM +0100, Anup Patel wrote:
>>> On Fri, Aug 16, 2013 at 11:20 PM, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>> > On Fri, Aug 16, 2013 at 11:12:08PM +0530, Anup Patel wrote:
>>> >> Discussion here is about getting KVM ARM64 working in-presence
>>> >> of an external L3-cache (i.e. not part of CPU). Before starting 
>>> a VCPU
>>> >> user-space typically loads images to guest RAM so, in-presence 
>>> of
>>> >> huge L3-cache (few MBs). When the VCPU starts running some of 
>>> the
>>> >> contents guest RAM will be still in L3-cache and VCPU runs with
>>> >> MMU off (i.e. cacheing off) hence VCPU will bypass L3-cache and
>>> >> see incorrect contents. To solve this problem we need to flush 
>>> the
>>> >> guest RAM contents before they are accessed by first time by 
>>> VCPU.
>>> >>
>>> > ok, I'm with you that far.
>>> >
>>> > But is it also not true that we need to decide between:
>>> >
>>> >   A.1: Flush the entire guest RAM before running the VCPU
>>> >   A.2: Flush the pages as we fault them in
>>>
>>> Yes, thats the decision we have to make.
>>>
>>> >
>>> > And (independently):
>>> >
>>> >   B.1: Use __flush_dcache_range
>>> >   B.2: Use something else + outer cache framework for arm64
>>>
>>> This would be __flush_dcache_all() + outer cache flush all.
>>
>> We need to be careful here since the __flush_dcache_all() operation 
>> uses
>> cache maintenance by set/way and these are *local* to a CPU (IOW not
>> broadcast). Do you have any guarantee that dirty cache lines don't
>> migrate between CPUs and __flush_dcache_all() wouldn't miss them?
>> Architecturally we don't, so this is not a safe operation that would
>> guarantee L1 cache flushing (we probably need to revisit some of the
>> __flush_dcache_all() calls in KVM, I haven't looked into this).
>>
>> So I think we are left to the range operation where the DC ops to 
>> PoC
>> would be enough for your L3.
>
> If __flush_dcache_all() is *local" to a CPU then I guess DC ops to 
> PoC
> by range would be the only option.
>
>>
>> An outer cache flush all is probably only needed for cpuidle/suspend
>> (the booting part should be handled by the boot loader).
>
> Yes, cpuidle/suspend would definitely require outer cache 
> maintenance.
>
> For KVM, we can avoid flushing d-cache to PoC every time in
> coherent_icache_guest_page() by only doing it when Guest MMU is
> turned-off. This may reduce the performance penalty.

What about the I and C bits in SCTLR_EL1? Does L3 also honour these 
bits?

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 13:21                                     ` Marc Zyngier
@ 2013-08-30 14:04                                       ` Catalin Marinas
  2013-08-30 14:22                                         ` Marc Zyngier
  0 siblings, 1 reply; 58+ messages in thread
From: Catalin Marinas @ 2013-08-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 02:21:57PM +0100, Marc Zyngier wrote:
> On 2013-08-30 11:44, Anup Patel wrote:
> > For KVM, we can avoid flushing d-cache to PoC every time in
> > coherent_icache_guest_page() by only doing it when Guest MMU is
> > turned-off. This may reduce the performance penalty.
> 
> What about the I and C bits in SCTLR_EL1? Does L3 also honour these 
> bits?

I would think so, it probably cares about how the transactions are
presented at the bus level by the CPU.

-- 
Catalin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 14:04                                       ` Catalin Marinas
@ 2013-08-30 14:22                                         ` Marc Zyngier
  2013-08-30 14:30                                           ` Will Deacon
  0 siblings, 1 reply; 58+ messages in thread
From: Marc Zyngier @ 2013-08-30 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 2013-08-30 15:04, Catalin Marinas wrote:
> On Fri, Aug 30, 2013 at 02:21:57PM +0100, Marc Zyngier wrote:
>> On 2013-08-30 11:44, Anup Patel wrote:
>> > For KVM, we can avoid flushing d-cache to PoC every time in
>> > coherent_icache_guest_page() by only doing it when Guest MMU is
>> > turned-off. This may reduce the performance penalty.
>>
>> What about the I and C bits in SCTLR_EL1? Does L3 also honour these
>> bits?
>
> I would think so, it probably cares about how the transactions are
> presented at the bus level by the CPU.

That'd make sense indeed. So we need to track both SCTLR_EL1.{I,C,M} in 
order to find out whether or not we need to clean the cache.

Mumble mumble...

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 14:22                                         ` Marc Zyngier
@ 2013-08-30 14:30                                           ` Will Deacon
  2013-08-30 14:52                                             ` Anup Patel
  2013-08-30 15:12                                             ` Marc Zyngier
  0 siblings, 2 replies; 58+ messages in thread
From: Will Deacon @ 2013-08-30 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 30, 2013 at 03:22:11PM +0100, Marc Zyngier wrote:
> On 2013-08-30 15:04, Catalin Marinas wrote:
> > On Fri, Aug 30, 2013 at 02:21:57PM +0100, Marc Zyngier wrote:
> >> On 2013-08-30 11:44, Anup Patel wrote:
> >> > For KVM, we can avoid flushing d-cache to PoC every time in
> >> > coherent_icache_guest_page() by only doing it when Guest MMU is
> >> > turned-off. This may reduce the performance penalty.
> >>
> >> What about the I and C bits in SCTLR_EL1? Does L3 also honour these
> >> bits?
> >
> > I would think so, it probably cares about how the transactions are
> > presented at the bus level by the CPU.
> 
> That'd make sense indeed. So we need to track both SCTLR_EL1.{I,C,M} in 
> order to find out whether or not we need to clean the cache.

<horrible hack>
Could you use a dummy ATS_* operation, then read the cacheability attributes
out of the PAR?
</horrible hack>

Will

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 14:30                                           ` Will Deacon
@ 2013-08-30 14:52                                             ` Anup Patel
  2013-08-30 15:12                                             ` Marc Zyngier
  1 sibling, 0 replies; 58+ messages in thread
From: Anup Patel @ 2013-08-30 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Fri, Aug 30, 2013 at 8:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Aug 30, 2013 at 03:22:11PM +0100, Marc Zyngier wrote:
>> On 2013-08-30 15:04, Catalin Marinas wrote:
>> > On Fri, Aug 30, 2013 at 02:21:57PM +0100, Marc Zyngier wrote:
>> >> On 2013-08-30 11:44, Anup Patel wrote:
>> >> > For KVM, we can avoid flushing d-cache to PoC every time in
>> >> > coherent_icache_guest_page() by only doing it when Guest MMU is
>> >> > turned-off. This may reduce the performance penalty.
>> >>
>> >> What about the I and C bits in SCTLR_EL1? Does L3 also honour these
>> >> bits?
>> >
>> > I would think so, it probably cares about how the transactions are
>> > presented at the bus level by the CPU.
>>
>> That'd make sense indeed. So we need to track both SCTLR_EL1.{I,C,M} in
>> order to find out whether or not we need to clean the cache.
>
> <horrible hack>
> Could you use a dummy ATS_* operation, then read the cacheability attributes
> out of the PAR?
> </horrible hack>

We will have to use ATS12xxx operation which can only be executed from EL2
or EL3. The host kernel runs at EL1 hence we will need to use HVC call to
execute ATS12xxx operation in EL2 mode which in-turn means a world-switch
overhead just to execute ATS12xxx operation.

>
> Will

--Anup

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
  2013-08-30 14:30                                           ` Will Deacon
  2013-08-30 14:52                                             ` Anup Patel
@ 2013-08-30 15:12                                             ` Marc Zyngier
  1 sibling, 0 replies; 58+ messages in thread
From: Marc Zyngier @ 2013-08-30 15:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2013-08-30 15:30, Will Deacon wrote:
> On Fri, Aug 30, 2013 at 03:22:11PM +0100, Marc Zyngier wrote:
>> On 2013-08-30 15:04, Catalin Marinas wrote:
>> > On Fri, Aug 30, 2013 at 02:21:57PM +0100, Marc Zyngier wrote:
>> >> On 2013-08-30 11:44, Anup Patel wrote:
>> >> > For KVM, we can avoid flushing d-cache to PoC every time in
>> >> > coherent_icache_guest_page() by only doing it when Guest MMU is
>> >> > turned-off. This may reduce the performance penalty.
>> >>
>> >> What about the I and C bits in SCTLR_EL1? Does L3 also honour 
>> these
>> >> bits?
>> >
>> > I would think so, it probably cares about how the transactions are
>> > presented at the bus level by the CPU.
>>
>> That'd make sense indeed. So we need to track both SCTLR_EL1.{I,C,M} 
>> in
>> order to find out whether or not we need to clean the cache.
>
> <horrible hack>
> Could you use a dummy ATS_* operation, then read the cacheability 
> attributes
> out of the PAR?
> </horrible hack>

That'd be doable, and maybe not that expensive (using ATS1, the Stage-1 
translation is probably still cached after the translation fault). 
Actually, I suspect this is more correct than just looking at SCTLR_EL1, 
as it doesn't assume that the guest uses always cacheable memory when 
enables the caches.

Shame HPFAR_EL2 doesn't report this information though...

         M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
@ 2013-08-14 11:45 Pranavkumar Sawargaonkar
  0 siblings, 0 replies; 58+ messages in thread
From: Pranavkumar Sawargaonkar @ 2013-08-14 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

Systems with large external L3-cache (few MBs), might have dirty
content belonging to the guest page in L3-cache. To tackle this,
we need to flush such dirty content from d-cache so that guest
will see correct contents of guest page when guest MMU is disabled.

The patch fixes coherent_icache_guest_page() for external L3-cache.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm64/include/asm/kvm_mmu.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index efe609c..5129038 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
 	if (!icache_is_aliasing()) {		/* PIPT */
 		unsigned long hva = gfn_to_hva(kvm, gfn);
 		flush_icache_range(hva, hva + PAGE_SIZE);
+		/* Flush d-cache for systems with external caches. */
+		__flush_dcache_area((void *) hva, PAGE_SIZE);
 	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
 		/* any kind of VIPT cache */
 		__flush_icache_all();
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2013-08-30 15:12 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
2013-08-14 12:04 ` Marc Zyngier
2013-08-14 14:22   ` Anup Patel
2013-08-14 15:06     ` Alexander Graf
2013-08-14 15:34       ` Marc Zyngier
2013-08-14 15:41         ` Peter Maydell
2013-08-14 15:57           ` Marc Zyngier
2013-08-14 16:36             ` Anup Patel
2013-08-14 15:23     ` Marc Zyngier
2013-08-14 15:35       ` Peter Maydell
2013-08-14 15:49         ` Marc Zyngier
2013-08-14 17:34           ` Christoffer Dall
2013-08-15  4:44             ` Marc Zyngier
2013-08-15 16:58               ` Christoffer Dall
2013-08-14 15:36       ` Anup Patel
2013-08-15  4:52     ` Marc Zyngier
2013-08-15  6:26       ` Anup Patel
2013-08-15  8:31         ` Marc Zyngier
2013-08-15 13:31           ` Anup Patel
2013-08-15 14:47             ` Marc Zyngier
2013-08-15 15:13               ` Anup Patel
2013-08-15 15:37                 ` Marc Zyngier
2013-08-15 15:45                   ` Anup Patel
2013-08-15 16:53                   ` Christoffer Dall
2013-08-16  5:02                     ` Anup Patel
2013-08-16  6:57                       ` Anup Patel
2013-08-16 17:19                         ` Christoffer Dall
2013-08-16 17:42                           ` Anup Patel
2013-08-16 17:50                             ` Christoffer Dall
2013-08-16 18:06                               ` Christoffer Dall
2013-08-16 18:20                                 ` Anup Patel
2013-08-16 18:11                               ` Anup Patel
2013-08-16 18:20                                 ` Christoffer Dall
2013-08-30  9:52                                 ` Catalin Marinas
2013-08-30 10:44                                   ` Anup Patel
2013-08-30 13:02                                     ` Catalin Marinas
2013-08-30 13:21                                     ` Marc Zyngier
2013-08-30 14:04                                       ` Catalin Marinas
2013-08-30 14:22                                         ` Marc Zyngier
2013-08-30 14:30                                           ` Will Deacon
2013-08-30 14:52                                             ` Anup Patel
2013-08-30 15:12                                             ` Marc Zyngier
2013-08-29 10:52                         ` Catalin Marinas
2013-08-29 12:31                           ` Anup Patel
2013-08-29 12:53                             ` Catalin Marinas
2013-08-29 16:02                               ` Anup Patel
2013-08-30  9:44                                 ` Catalin Marinas
2013-08-30 10:36                                   ` Anup Patel
2013-08-30 12:52                                     ` Catalin Marinas
2013-08-16 17:14                       ` Christoffer Dall
     [not found]                         ` <CALrVBkvEP1Q0mKpv8ViOTLRvW2ks18MQXgmurSBHn+aJcz+=gw@mail.gmail.com>
2013-08-16 17:28                           ` Christoffer Dall
2013-08-16 17:42                             ` Christoffer Dall
2013-08-15  8:39       ` Pranavkumar Sawargaonkar
2013-08-15 15:42         ` Marc Zyngier
2013-08-14 14:23 ` Sudeep KarkadaNagesha
2013-08-14 14:35   ` Anup Patel
2013-08-14 17:37 ` Christoffer Dall
  -- strict thread matches above, loose matches on Subject: below --
2013-08-14 11:45 Pranavkumar Sawargaonkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.