All of lore.kernel.org
 help / color / mirror / Atom feed
From: slash.tmp@free.fr (Mason)
To: linux-arm-kernel@lists.infradead.org
Subject: read_cpuid_id() in arch/arm/kernel/setup.c
Date: Fri, 13 Mar 2015 17:39:23 +0100	[thread overview]
Message-ID: <550312BB.8020902@free.fr> (raw)
In-Reply-To: <20150313161953.GS8656@n2100.arm.linux.org.uk>

On 13/03/2015 17:19, Russell King - ARM Linux wrote:
> On Fri, Mar 13, 2015 at 05:03:52PM +0100, Mason wrote:
>> Hello everyone,
>>
>> As far as I can tell, read_cpuid_id() resolves to read_cpuid(CPUID_ID)
>> which resolves to mrc 15, 0, rN, cr0, cr0, {0}
>>
>> Consider this:
>>
>> /*
>>   * The CPU ID never changes at run time, so we might as well tell the
>>   * compiler that it's constant.  Use this function to read the CPU ID
>>   * rather than directly reading processor_id or read_cpuid() directly.
>>   */
>> static inline unsigned int __attribute_const__ read_cpuid_id(void)
>> {
>> 	return read_cpuid(CPUID_ID);
>> }
>>
>> Despite the comment and attribute, my compiler(*) still reloads the
>> value every time.
>>
>> (*) gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)
>>
>> e.g.
>>
>> static int __get_cpu_architecture(void)
>> {
>> 	int cpu_arch;
>>
>> 	unsigned int id = read_cpuid_id();
>> 	if ((id & 0x0008f000) == 0) {
>> 		cpu_arch = CPU_ARCH_UNKNOWN;
>> 	} else if ((id & 0x0008f000) == 0x00007000) {
>> 		cpu_arch = (id & (1 << 23)) ? CPU_ARCH_ARMv4T : CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x00080000) == 0x00000000) {
>> 		cpu_arch = (id >> 16) & 7;
>> 		if (cpu_arch)
>> 			cpu_arch += CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x000f0000) == 0x000f0000) {
>>
>> resolves to
>>
>> c01fec74:       ee10cf10        mrc     15, 0, ip, cr0, cr0, {0}
>> c01fec78:       e21cca8f        ands    ip, ip, #585728 ; 0x8f000
>> c01fec7c:       e34c3023        movt    r3, #49187      ; 0xc023
>> c01fec80:       e5837008        str     r7, [r3, #8]
>> c01fec84:       e50b304c        str     r3, [fp, #-76]  ; 0x4c
>> c01fec88:       0a000022        beq     c01fed18 <setup_arch+0xe4>
>> c01fec8c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fec90:       e2033a8f        and     r3, r3, #585728 ; 0x8f000
>> c01fec94:       e3530a07        cmp     r3, #28672      ; 0x7000
>> c01fec98:       1a000004        bne     c01fecb0 <setup_arch+0x7c>
>> c01fec9c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01feca0:       e3130502        tst     r3, #8388608    ; 0x800000
>> c01feca4:       13a0c003        movne   ip, #3
>> c01feca8:       03a0c001        moveq   ip, #1
>> c01fecac:       ea000019        b       c01fed18 <setup_arch+0xe4>
>> c01fecb0:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fecb4:       e3130702        tst     r3, #524288     ; 0x80000
>>
>>
>> So I thought it would be nice to give the poor compiler a break,
>> and just stuff the result in a local variable:
>
> NAK.
>
> Your compiler behaviour is different to mine (stock gcc 4.7.4):
>
>   4f8:   e1a0c00d        mov     ip, sp
>   4fc:   e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}
>   500:   e24cb004        sub     fp, ip, #4
>   504:   e24dd00c        sub     sp, sp, #12
>   508:   ee105f10        mrc     15, 0, r5, cr0, cr0, {0} <== load id
>   50c:   e1a07000        mov     r7, r0
>   510:   e1a00005        mov     r0, r5			<== r5 = id
>   514:   ebfffffe        bl      0 <lookup_processor_type>
>   518:   e2504000        subs    r4, r0, #0
>   51c:   1a000003        bne     530 <setup_arch+0x38>
>   520:   e59f063c        ldr     r0, [pc, #1596] ; b64 <setup_arch+0x66c>
>   524:   e1a01005        mov     r1, r5
>   528:   ebfffffe        bl      0 <printk>
>   52c:   eafffffe        b       52c <setup_arch+0x34>
>   530:   e5948020        ldr     r8, [r4, #32]
>   534:   e2153a8f        ands    r3, r5, #585728 ; 0x8f000 <== r5 = id
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   53c:   e5828000        str     r8, [r2]
>   540:   0a00001f        beq     5c4 <setup_arch+0xcc>
>   544:   e3530a07        cmp     r3, #28672      ; 0x7000
>   548:   1a000003        bne     55c <setup_arch+0x64>
>   54c:   e3150502        tst     r5, #8388608    ; 0x800000 <== r5 = id
>   550:   03a03001        moveq   r3, #1
>   554:   13a03003        movne   r3, #3
>   558:   ea000019        b       5c4 <setup_arch+0xcc>
>   55c:   e3150702        tst     r5, #524288     ; 0x80000 <== r5 = id
>   560:   1a000003        bne     574 <setup_arch+0x7c>
>   564:   e7e23855        ubfx    r3, r5, #16, #3		<== r5 = id
>   568:   e3530000        cmp     r3, #0
>   56c:   12833001        addne   r3, r3, #1
>   570:   ea000013        b       5c4 <setup_arch+0xcc>
>   574:   e205580f        and     r5, r5, #983040 ; 0xf0000 <== r5 = id
>   578:   e355080f        cmp     r5, #983040     ; 0xf0000
>   57c:   13a03000        movne   r3, #0
>   580:   1a00000f        bne     5c4 <setup_arch+0xcc>
> ...
>
> The point here is that the compiler is free to optimise the code as it
> sees fit - if it decides that the register pressure from having the
> value cached in a register is too high, it can decide to spill the
> cached value, and reload it from CP15 as and when it needs to.  That
> is an advantage.

Good point. I hadn't thought of that.

Do you know the latency of an mrc instruction? (compared to a mov)

> It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3.  In fact,
> it looks like Linaro's 4.9.3 is rather buggy as far as optimisation
> goes.
>
> Later compilers aren't always better.

I did NOT expect that. Compiler optimizations passes are so fragile.

Anyway, here's another proposed nano-improvement ;-)
(This one is code factorization)

--- setup.c	2015-03-03 18:04:59.000000000 +0100
+++ setup.bar.c	2015-03-13 17:29:23.800668429 +0100
@@ -246,12 +246,9 @@
  		if (cpu_arch)
  			cpu_arch += CPU_ARCH_ARMv3;
  	} else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) {
-		unsigned int mmfr0;
-
  		/* Revised CPUID format. Read the Memory Model Feature
  		 * Register 0 and check for VMSAv7 or PMSAv7 */
-		asm("mrc	p15, 0, %0, c0, c1, 4"
-		    : "=r" (mmfr0));
+		unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0);
  		if ((mmfr0 & 0x0000000f) >= 0x00000003 ||
  		    (mmfr0 & 0x000000f0) >= 0x00000030)
  			cpu_arch = CPU_ARCH_ARMv7;


This one looks good, doesn't it? :-)

Regards.

  reply	other threads:[~2015-03-13 16:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13 16:03 read_cpuid_id() in arch/arm/kernel/setup.c Mason
2015-03-13 16:19 ` Russell King - ARM Linux
2015-03-13 16:39   ` Mason [this message]
2015-03-13 16:45     ` Russell King - ARM Linux
2015-03-13 17:06       ` Mason
2015-03-15 17:40       ` Mason
2015-03-16  8:44         ` Mason
2015-03-16 16:54           ` Paul Walmsley
2015-03-16 22:17             ` Mason
2015-03-16 23:30               ` Mason
2015-03-13 16:56 ` Mark Rutland
2015-03-13 17:02   ` Russell King - ARM Linux
2015-03-13 18:26     ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550312BB.8020902@free.fr \
    --to=slash.tmp@free.fr \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.