On Tue, 20 Feb 2018, Julien Grall wrote: > On 20/02/2018 21:16, Julien Grall wrote: > > Hi, > > > > On 20/02/2018 21:03, Stefano Stabellini wrote: > >> On Tue, 20 Feb 2018, Julien Grall wrote: > >>> On 19/02/18 21:58, Stefano Stabellini wrote: > >>>> +        mrc   CP32(r6, CSSELR_EL1) > >>> > >>> The size of the cache is read using CSSIDR_EL1. But it looks like the > >>> way we > >>> get the cache line size in Xen is fragile. > >>> > >>> We are retrieving the cache line size of Level 1 and assume this will > >>> be valid > >>> for all the other caches. Indeed cache maintenance ops may propagate > >>> to other > >>> caches depending the target (Point of Coherency vs Point of > >>> Unification). > >>> > >>> Looking at the ARM ARM "Cache hierarchy abstraction for address-based > >>> operations" (D3-2061 DDI 0487C.a), CTR_EL0/CTR will holds the minimum > >>> line > >>> lenght values for the data caches. The value will be the most efficient > >>> address stride to use to apply a sequence of VA-based maintenance > >>> instructions > >>> to a range of VAs. > >>> > >>> So it would be best and safer for Xen to use CTR/CTLR_EL0.DminLine. > >> > >> This is insightful, thank you. Given that this patch is a backport > >> candidate, I would prefer to retain the same behavior we had before in > >> setup_cache. I can write a separate patch on top of this to make the > >> change to use CTR/CTLR_EL0.DminLine. That way, we can make a separate > >> decision on each of them on whether we want to backport them (and > >> potentially revert them) or not. In other words: this patch as-is is > >> suboptimal but is of very little risk. Making changes to the way we > >> determine the cacheline size improves the patch but significantly > >> increases the risk factor associated with it. > >> > >> Does it make sense? > > > > By this patch you mean big.LITTLE? If so, then I don't consider it as a > > potential backport. big.LITTLE has never been supported on Xen and hence > > should be considered as a new feature. What is backportable is the patch > > #1 that forbid big.LITTLE. Patch #1 ends up forcing people to use big cores only on many platforms, which from what you wrote can be unsafe. I suggest we backport the whole series, so that at least users can configure the system to use LITTLE cores only, or a mix of the two. The big.LITTLE doc in particular is certainly worth backporting but only makes sense with the rest of the series. On support statements: I noticed that big.LITTLE is actually lacking from SUPPORT.md. > > Regarding the cache line size, I didn't suggest the change because it is > > more efficient. I suggested the patch because the current code to find > > the cache line size is wrong. Imagine there is a cache in the hierarchy > > that has a smaller cache line than your L1 cache. Then you would not > > clean/invalidate correctly that cache. I didn't mean to imply that what you are suggesting is not important, or less important than the purpose of patch. I just meant to say that this patch is about removing the cacheline_bytes variable, it is not about fixing the way we read the cacheline size. I like to keep one patch doing one thing only. The fix you are suggesting is important, in fact it is probably more important than this series. I am OK writing a patch for it. It is just that it is a separate issue, and should be fix separately. I'll have a look at it and propose it a separate patch. > >>>> +        and   r6, r6, #0x7 > >>>> +        add   r6, r6, #4 > >>>> +        mov   r7, #1 > >>>> +        lsl   r6, r7, r6 > >>>>            mov   r7, r3 > >>>>      1:      mcr   CP32(r7, DCCMVAC) > >>>> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > >>>> index fa0ef70..edea300 100644 > >>>> --- a/xen/arch/arm/arm64/head.S > >>>> +++ b/xen/arch/arm/arm64/head.S > >>>> @@ -631,8 +631,14 @@ ENTRY(relocate_xen) > >>>>            dsb   sy        /* So the CPU issues all writes to the > >>>> range */ > >>>>              mov   x9, x3 > >>>> -        ldr   x10, =cacheline_bytes /* x10 := step */ > >>>> -        ldr   x10, [x10] > >>>> + > >>>> +        mov   x10, #0 > >>>> +        msr   CSSELR_EL1, x10 > >>>> +        mrs   x10, CSSELR_EL1 > >>>> +        and   x10, x10, #0x7 > >>>> +        add   x10, x10, #4 > >>>> +        mov   x11, #1 > >>>> +        lsl   x10, x11, x10 > >>> > >>> Please use dcache_line_size macro (see cache.S). > >> > >> Similarly, I would prefer to retain the same old behavior here, and > >> fix it/improve it in a separate patch. > > > > See above, you got the wrong end of the stick about the cache line size. > > You might want to look at the following patch in Linux: > > commit f91e2c3bd427239c198351f44814dd39db91afe0 > Author: Catalin Marinas > Date: Tue Dec 7 16:52:04 2010 +0100 > > ARM: 6527/1: Use CTR instead of CCSIDR for the D-cache line size on ARMv7 > > The current implementation of the dcache_line_size macro reads the L1 > cache size from the CCSIDR register. This, however, is not guaranteed to > be the smallest cache line in the cache hierarchy. The patch changes to > the macro to use the more architecturally correct CTR register. > > Reported-by: Kevin Sapp > Signed-off-by: Catalin Marinas > Signed-off-by: Russell King Thank you for the pointer, I'll give it a look.