All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
@ 2018-12-10 15:28 Ard Biesheuvel
  2018-12-10 15:48 ` Russell King - ARM Linux
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2018-12-10 15:28 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: Marc Zyngier, Will Deacon, Russell King, Ard Biesheuvel

Currently, init_static_idmap() installs some page table entries to
cover the identity mapped part of the kernel image (which is only
about 160 bytes in size in a multi_v7_defconfig Thumb2 build), and
calls flush_cache_louis() to ensure that the updates are visible
to the page table walker on the same core.

When running under virtualization, flush_cache_louis() may take more
than 10 seconds to complete:

[    0.108192] Setting up static identity map for 0x40300000 - 0x403000a0
[   13.078127] rcu: Hierarchical SRCU implementation.

This is due to the fact that set/way ops are not virtualizable, and so
KVM may trap each one, resulting in a substantial delay.

Since only LPAE capable CPUs may execute under virtualization, and
considering that LPAE capable CPUs are guaranteed to have cache
coherent page table walkers (per the architecture), let's only
perform this cache maintenance on non-LPAE cores.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/mm/idmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 1d1edd064199..a033f6134a64 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -6,6 +6,7 @@
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
+#include <asm/hwcap.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
 #include <asm/sections.h>
@@ -110,7 +111,8 @@ static int __init init_static_idmap(void)
 			     __idmap_text_end, 0);
 
 	/* Flush L1 for the hardware to see this page table content */
-	flush_cache_louis();
+	if (!(elf_hwcap & HWCAP_LPAE))
+		flush_cache_louis();
 
 	return 0;
 }
-- 
2.19.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-10 15:28 [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores Ard Biesheuvel
@ 2018-12-10 15:48 ` Russell King - ARM Linux
  2018-12-10 16:13   ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2018-12-10 15:48 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Marc Zyngier, Will Deacon, linux-arm-kernel

On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
> Since only LPAE capable CPUs may execute under virtualization, and
> considering that LPAE capable CPUs are guaranteed to have cache
> coherent page table walkers (per the architecture), let's only
> perform this cache maintenance on non-LPAE cores.

That statement doesn't stack up.  What about Cortex A15, which is a
32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
category.

Sorry, but no, I don't think we can omit this cache flush on LPAE.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-10 15:48 ` Russell King - ARM Linux
@ 2018-12-10 16:13   ` Marc Zyngier
  2018-12-13 10:57     ` Ard Biesheuvel
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2018-12-10 16:13 UTC (permalink / raw)
  To: Russell King - ARM Linux, Ard Biesheuvel; +Cc: Will Deacon, linux-arm-kernel

On 10/12/2018 15:48, Russell King - ARM Linux wrote:
> On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
>> Since only LPAE capable CPUs may execute under virtualization, and
>> considering that LPAE capable CPUs are guaranteed to have cache
>> coherent page table walkers (per the architecture), let's only
>> perform this cache maintenance on non-LPAE cores.
> 
> That statement doesn't stack up.  What about Cortex A15, which is a
> 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
> category.

As already documented in dcadda146f4fd25a732382747f306465d337cda6
("arm/kvm: excise redundant cache maintenance"):

<quote>
Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
virtualization extensions mandate the multiprocessing extensions.

Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
requirements"), as described in the sub-section titled "TLB maintenance
operations and the memory order model", this maintenance is not required
in the presence of the multiprocessing extensions.
</quote>

Furthermore, as per B1.6 ("The Large Physical Address Extension") from
the same document:

<quote>
An implementation that includes the Large Physical Address Extension
must implement the Multiprocessing Extensions and therefore cannot
include the FCSE, see Use of the Fast Context Switch Extension on page
AppxI-2475.
</quote>

So on a core like Cortex A15 where we have both MP, VE and LPAE, we
should be able to assume a coherent page table walker.

Thanks,

	M.

> Sorry, but no, I don't think we can omit this cache flush on LPAE.


-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-10 16:13   ` Marc Zyngier
@ 2018-12-13 10:57     ` Ard Biesheuvel
  2018-12-13 11:40       ` Marc Zyngier
  2018-12-13 12:03       ` Marc Zyngier
  0 siblings, 2 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2018-12-13 10:57 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Will Deacon, Russell King, linux-arm-kernel

On Mon, 10 Dec 2018 at 17:13, Marc Zyngier <marc.zyngier@arm.com> wrote:
>
> On 10/12/2018 15:48, Russell King - ARM Linux wrote:
> > On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
> >> Since only LPAE capable CPUs may execute under virtualization, and
> >> considering that LPAE capable CPUs are guaranteed to have cache
> >> coherent page table walkers (per the architecture), let's only
> >> perform this cache maintenance on non-LPAE cores.
> >
> > That statement doesn't stack up.  What about Cortex A15, which is a
> > 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
> > category.
>
> As already documented in dcadda146f4fd25a732382747f306465d337cda6
> ("arm/kvm: excise redundant cache maintenance"):
>
> <quote>
> Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
> virtualization extensions mandate the multiprocessing extensions.
>
> Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
> requirements"), as described in the sub-section titled "TLB maintenance
> operations and the memory order model", this maintenance is not required
> in the presence of the multiprocessing extensions.
> </quote>
>
> Furthermore, as per B1.6 ("The Large Physical Address Extension") from
> the same document:
>
> <quote>
> An implementation that includes the Large Physical Address Extension
> must implement the Multiprocessing Extensions and therefore cannot
> include the FCSE, see Use of the Fast Context Switch Extension on page
> AppxI-2475.
> </quote>
>
> So on a core like Cortex A15 where we have both MP, VE and LPAE, we
> should be able to assume a coherent page table walker.
>

Thanks Marc

I'll drop this into the patch system.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-13 10:57     ` Ard Biesheuvel
@ 2018-12-13 11:40       ` Marc Zyngier
  2018-12-13 11:42         ` Ard Biesheuvel
  2018-12-13 12:03       ` Marc Zyngier
  1 sibling, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2018-12-13 11:40 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Will Deacon, Russell King, linux-arm-kernel

On 13/12/2018 10:57, Ard Biesheuvel wrote:
> On Mon, 10 Dec 2018 at 17:13, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 10/12/2018 15:48, Russell King - ARM Linux wrote:
>>> On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
>>>> Since only LPAE capable CPUs may execute under virtualization, and
>>>> considering that LPAE capable CPUs are guaranteed to have cache
>>>> coherent page table walkers (per the architecture), let's only
>>>> perform this cache maintenance on non-LPAE cores.
>>>
>>> That statement doesn't stack up.  What about Cortex A15, which is a
>>> 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
>>> category.
>>
>> As already documented in dcadda146f4fd25a732382747f306465d337cda6
>> ("arm/kvm: excise redundant cache maintenance"):
>>
>> <quote>
>> Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
>> virtualization extensions mandate the multiprocessing extensions.
>>
>> Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
>> requirements"), as described in the sub-section titled "TLB maintenance
>> operations and the memory order model", this maintenance is not required
>> in the presence of the multiprocessing extensions.
>> </quote>
>>
>> Furthermore, as per B1.6 ("The Large Physical Address Extension") from
>> the same document:
>>
>> <quote>
>> An implementation that includes the Large Physical Address Extension
>> must implement the Multiprocessing Extensions and therefore cannot
>> include the FCSE, see Use of the Fast Context Switch Extension on page
>> AppxI-2475.
>> </quote>
>>
>> So on a core like Cortex A15 where we have both MP, VE and LPAE, we
>> should be able to assume a coherent page table walker.
>>
> 
> Thanks Marc
> 
> I'll drop this into the patch system.

Cool, it'd be useful to have this from an architecture compliance point
of view.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-13 11:40       ` Marc Zyngier
@ 2018-12-13 11:42         ` Ard Biesheuvel
  2018-12-13 12:00           ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2018-12-13 11:42 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Will Deacon, Russell King, linux-arm-kernel

On Thu, 13 Dec 2018 at 12:40, Marc Zyngier <marc.zyngier@arm.com> wrote:
>
> On 13/12/2018 10:57, Ard Biesheuvel wrote:
> > On Mon, 10 Dec 2018 at 17:13, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>
> >> On 10/12/2018 15:48, Russell King - ARM Linux wrote:
> >>> On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
> >>>> Since only LPAE capable CPUs may execute under virtualization, and
> >>>> considering that LPAE capable CPUs are guaranteed to have cache
> >>>> coherent page table walkers (per the architecture), let's only
> >>>> perform this cache maintenance on non-LPAE cores.
> >>>
> >>> That statement doesn't stack up.  What about Cortex A15, which is a
> >>> 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
> >>> category.
> >>
> >> As already documented in dcadda146f4fd25a732382747f306465d337cda6
> >> ("arm/kvm: excise redundant cache maintenance"):
> >>
> >> <quote>
> >> Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
> >> virtualization extensions mandate the multiprocessing extensions.
> >>
> >> Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
> >> requirements"), as described in the sub-section titled "TLB maintenance
> >> operations and the memory order model", this maintenance is not required
> >> in the presence of the multiprocessing extensions.
> >> </quote>
> >>
> >> Furthermore, as per B1.6 ("The Large Physical Address Extension") from
> >> the same document:
> >>
> >> <quote>
> >> An implementation that includes the Large Physical Address Extension
> >> must implement the Multiprocessing Extensions and therefore cannot
> >> include the FCSE, see Use of the Fast Context Switch Extension on page
> >> AppxI-2475.
> >> </quote>
> >>
> >> So on a core like Cortex A15 where we have both MP, VE and LPAE, we
> >> should be able to assume a coherent page table walker.
> >>
> >
> > Thanks Marc
> >
> > I'll drop this into the patch system.
>
> Cool, it'd be useful to have this from an architecture compliance point
> of view.
>

As in, the avoidance of set/way ops invoked from the OS? I think there
are other places that would need to be fixed if that is what you are
after.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-13 11:42         ` Ard Biesheuvel
@ 2018-12-13 12:00           ` Marc Zyngier
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2018-12-13 12:00 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Will Deacon, Russell King, linux-arm-kernel

On 13/12/2018 11:42, Ard Biesheuvel wrote:
> On Thu, 13 Dec 2018 at 12:40, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 13/12/2018 10:57, Ard Biesheuvel wrote:
>>> On Mon, 10 Dec 2018 at 17:13, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>
>>>> On 10/12/2018 15:48, Russell King - ARM Linux wrote:
>>>>> On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
>>>>>> Since only LPAE capable CPUs may execute under virtualization, and
>>>>>> considering that LPAE capable CPUs are guaranteed to have cache
>>>>>> coherent page table walkers (per the architecture), let's only
>>>>>> perform this cache maintenance on non-LPAE cores.
>>>>>
>>>>> That statement doesn't stack up.  What about Cortex A15, which is a
>>>>> 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
>>>>> category.
>>>>
>>>> As already documented in dcadda146f4fd25a732382747f306465d337cda6
>>>> ("arm/kvm: excise redundant cache maintenance"):
>>>>
>>>> <quote>
>>>> Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
>>>> virtualization extensions mandate the multiprocessing extensions.
>>>>
>>>> Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
>>>> requirements"), as described in the sub-section titled "TLB maintenance
>>>> operations and the memory order model", this maintenance is not required
>>>> in the presence of the multiprocessing extensions.
>>>> </quote>
>>>>
>>>> Furthermore, as per B1.6 ("The Large Physical Address Extension") from
>>>> the same document:
>>>>
>>>> <quote>
>>>> An implementation that includes the Large Physical Address Extension
>>>> must implement the Multiprocessing Extensions and therefore cannot
>>>> include the FCSE, see Use of the Fast Context Switch Extension on page
>>>> AppxI-2475.
>>>> </quote>
>>>>
>>>> So on a core like Cortex A15 where we have both MP, VE and LPAE, we
>>>> should be able to assume a coherent page table walker.
>>>>
>>>
>>> Thanks Marc
>>>
>>> I'll drop this into the patch system.
>>
>> Cool, it'd be useful to have this from an architecture compliance point
>> of view.
>>
> 
> As in, the avoidance of set/way ops invoked from the OS? I think there
> are other places that would need to be fixed if that is what you are
> after.

Set/way would definitely deserve to be eradicated, but at least
preventing the kernel from doing unnecessary work is a good start.

If memory serves well, at least the decompressor is making use of
set/way ops as well, and would need to be converted to cleaning by VA.

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores
  2018-12-13 10:57     ` Ard Biesheuvel
  2018-12-13 11:40       ` Marc Zyngier
@ 2018-12-13 12:03       ` Marc Zyngier
  1 sibling, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2018-12-13 12:03 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Will Deacon, Russell King, linux-arm-kernel

On 13/12/2018 10:57, Ard Biesheuvel wrote:
> On Mon, 10 Dec 2018 at 17:13, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 10/12/2018 15:48, Russell King - ARM Linux wrote:
>>> On Mon, Dec 10, 2018 at 04:28:55PM +0100, Ard Biesheuvel wrote:
>>>> Since only LPAE capable CPUs may execute under virtualization, and
>>>> considering that LPAE capable CPUs are guaranteed to have cache
>>>> coherent page table walkers (per the architecture), let's only
>>>> perform this cache maintenance on non-LPAE cores.
>>>
>>> That statement doesn't stack up.  What about Cortex A15, which is a
>>> 32-bit core with LPAE support?  TI Keystone2 SoCs fall into this
>>> category.
>>
>> As already documented in dcadda146f4fd25a732382747f306465d337cda6
>> ("arm/kvm: excise redundant cache maintenance"):
>>
>> <quote>
>> Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
>> virtualization extensions mandate the multiprocessing extensions.
>>
>> Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
>> requirements"), as described in the sub-section titled "TLB maintenance
>> operations and the memory order model", this maintenance is not required
>> in the presence of the multiprocessing extensions.
>> </quote>
>>
>> Furthermore, as per B1.6 ("The Large Physical Address Extension") from
>> the same document:
>>
>> <quote>
>> An implementation that includes the Large Physical Address Extension
>> must implement the Multiprocessing Extensions and therefore cannot
>> include the FCSE, see Use of the Fast Context Switch Extension on page
>> AppxI-2475.
>> </quote>
>>
>> So on a core like Cortex A15 where we have both MP, VE and LPAE, we
>> should be able to assume a coherent page table walker.
>>
> 
> Thanks Marc
> 
> I'll drop this into the patch system.

And please add by

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-13 12:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 15:28 [PATCH] ARM: mm: skip cleaning of idmap page tables on LPAE capable cores Ard Biesheuvel
2018-12-10 15:48 ` Russell King - ARM Linux
2018-12-10 16:13   ` Marc Zyngier
2018-12-13 10:57     ` Ard Biesheuvel
2018-12-13 11:40       ` Marc Zyngier
2018-12-13 11:42         ` Ard Biesheuvel
2018-12-13 12:00           ` Marc Zyngier
2018-12-13 12:03       ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.