From mboxrd@z Thu Jan  1 00:00:00 1970
From: slash.tmp@free.fr (Mason)
Date: Fri, 27 Mar 2015 14:45:13 +0100
Subject: Cache line size definition in arch/arm/mm/Kconfig
In-Reply-To: <20150327120601.GB4019@n2100.arm.linux.org.uk>
References: <5512C7A4.3000302@free.fr> <5515423E.4020802@free.fr>
 <20150327120601.GB4019@n2100.arm.linux.org.uk>
Message-ID: <55155EE9.6020600@free.fr>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On 27/03/2015 13:06, Russell King - ARM Linux wrote:
> On Fri, Mar 27, 2015 at 12:42:54PM +0100, Mason wrote:
>> On 25/03/2015 15:35, Mason wrote:
>>
>>> AFAICT, L1 cache line size is specified in arch/arm/mm/Kconfig
>>>
>>> config ARM_L1_CACHE_SHIFT_6
>>>      bool
>>>      default y if CPU_V7
>>>      help
>>>        Setting ARM L1 cache line size to 64 Bytes.
>>>
>>> config ARM_L1_CACHE_SHIFT
>>>      int
>>>      default 6 if ARM_L1_CACHE_SHIFT_6
>>>      default 5
>>>
>>>
>>> I'm using a Cortex A9 MPCore. If I'm not mistaken, the cache line size
>>> is 32 bytes, even though this CPU is ARMv7.
>>>
>>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388g/Caccifbd.html
>>>
>>>> The Cortex-A9 processor has separate instruction and data caches.
>>>> The caches have the following features:
>>>>
>>>>   Each cache can be disabled independently. See System Control Register.
>>>>   Both caches are 4-way set-associative.
>>>>   The cache line length is eight words.
>>>>   On a cache miss, critical word first filling of the cache is performed.
>>>>   You can configure the instruction and data caches independently during implementation to sizes of 16KB, 32KB, or 64KB.
>>>>   To reduce power consumption, the number of full cache reads is reduced by taking advantage of the sequential nature of many cache operations. If a cache read is sequential to the previous cache read, and the read is within the same cache line, only the data RAM set that was previously read is accessed.
>>>
>>> How do I set ARM_L1_CACHE_SHIFT_6 to 'n' in my platform Kconfig?
>>>
>>> Or perhaps I should "override" ARM_L1_CACHE_SHIFT to 5 (again in
>>> my platform Kconfig). I don't know the syntax to do that.
>>>
>>> Could someone point out the correct way?
>>
>> Would someone care to comment? :-)
>
> No :)

I'm glad that you've decided to disagree with yourself! :-)

> What you've found is the _static_ L1 cache line size setting, which is
> used at _compile_ time to align structures while building.  To allow
> maximum flexibility - and because there are 64-byte cache line ARMv7
> implementations around - we have decided that the _compile time_
> cache line size will be 64 bytes.

Right, I had a complete brain malfunction there. Compiler needs to be
told the cache line size to properly align relevant objects.

> As far as cache operations are concerned, they will know the correct
> cache line size for the CPU which they're running on, so the code
> will adapt.
>
> It has the side effect that some allocators also assume that the L1
> cache line size is 64 bytes.
>
> It's better to have a larger than necessary cache line size than a
> smaller one, because a larger one is automatically aligned to the
> smaller sizes.
>
> In other words, this is totally intentional.

I don't understand why I should not override ARM_L1_CACHE_SHIFT to 5
in my platform-specific Kconfig, since I know I have a 32-byte cache
line size?


Oh and while I have your attention ;-) I have alignment-related
questions about clocksource_mmio_init() (commit 442c8176d2) wrt
Thomas Gleixner's 369db4c952 patch. (I think the two patches
do not play nice.)

369db4c952 moved some struct clocksource fields around to group
hot fields in a single cache line at the beginning of the struct,
and marked the struct as cache aligned. This works as expected
with static structures.

However, I don't think it works as expected with dynamically
allocated struct clocksource. It seems to me (and I may very
well be wrong!) that struct clocksource_mmio should have the
clksrc field at the beginning of the struct, and we should
use an allocation function that returns cache aligned memory?

struct clocksource_mmio {
	struct clocksource clksrc;
	void __iomem *reg;
};

   cs = kmagic_cache_alloc(sizeof *cs, GFP_KERNEL);

That way clksrc would effectively be cache aligned, right?


One thing that caught me off-guard: when CONFIG_ARCH_CLOCKSOURCE_DATA
and CONFIG_CLOCKSOURCE_WATCHDOG are undefined, struct clocksource
weighs 80 bytes on a 32-bit system. I would expect the "reg" field
at the end to "fit in the hole", but in fact, gcc seems to "stretch"
struct clocksource before considering other fields. This may be a
bug in gcc's extension?

Regards.