All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jianmin Lv <lvjianmin@loongson.cn>
To: WANG Xuerui <kernel@xen0n.name>,
	Huacai Chen <chenhuacai@loongson.cn>,
	Arnd Bergmann <arnd@arndb.de>,
	Huacai Chen <chenhuacai@kernel.org>
Cc: loongarch@lists.linux.dev, linux-arch@vger.kernel.org,
	Xuefeng Li <lixuefeng@loongson.cn>, Guo Ren <guoren@kernel.org>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] LoongArch: Make -mstrict-align be configurable
Date: Mon, 6 Feb 2023 18:24:53 +0800	[thread overview]
Message-ID: <5303aeda-5c66-ede6-b3ac-7d8ebd73ec70@loongson.cn> (raw)
In-Reply-To: <5fc85453-1e2c-1f00-7879-1b5fa318c78a@xen0n.name>



On 2023/2/2 下午6:30, WANG Xuerui wrote:
> On 2023/2/2 16:42, Huacai Chen wrote:
>> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>> configurable.
>>
>> Not all LoongArch cores support h/w unaligned access, we can use the
>> -mstrict-align build parameter to prevent unaligned accesses.
>>
>> This option is disabled by default to optimise for performance, but you
>> can enabled it manually if you want to run kernel on systems without h/w
>> unaligned access support.
> 
> It's customary to accompany "performance-related" changes like this with 
> some benchmark numbers and concrete use cases where this would be 
> profitable. Especially given that arch/loongarch developer and user base 
> is relatively small, we probably don't want to allow customization of 
> such a low-level characteristic. In general kernel performance does not 
> vary much with compiler flags like this, so I'd really hope to see some 
> numbers here to convince people that this is *really* providing gains.
> 
> Also, defaulting to emitting unaligned accesses would mean those future, 
> likely embedded models (and AFAIK some existing models that haven't 
> reached GA yet) would lose support with the defconfig. Which means 
> downstream packagers that care about those use cases would have one more 
> non-default, non-generic option to carry within their Kconfig. We 
> probably don't want to repeat the history of other architectures (think 
> arch/arm or arch/mips) where there wasn't really generic builds and 
> board-specific tweaks proliferated.
> 

Hi, Xuerui

I think the kernels produced with and without -mstrict-align have mainly 
following differences:
- Diffirent size. I build two kernls (vmlinux), size of kernel with 
-mstrict-align is 26533376 bytes and size of kernel without 
-mstrict-align is 26123280 bytes.
- Diffirent performance. For example, in kernel function jhash(), the 
assemble code slices with and without -mstrict-align are following:

without -mstrict-align:
900000000032736c <jhash>:
900000000032736c:       15bd5b6d        lu12i.w         $t1, 
-136485(0xdeadb)
9000000000327370:       03bbbdad        ori             $t1, $t1, 0xeef
9000000000327374:       001019ad        add.w           $t1, $t1, $a2
9000000000327378:       001015ae        add.w           $t2, $t1, $a1
900000000032737c:       0280300c        addi.w          $t0, $zero, 
12(0xc)
9000000000327380:       00150091        move            $t5, $a0
9000000000327384:       001501d0        move            $t4, $t2 

9000000000327388:       001501c4        move            $a0, $t2 

900000000032738c:       6c009585        bgeu            $t0, $a1, 
148(0x94)     # 9000000000327420 <jhash+0xb4>
9000000000327390:       02803012        addi.w          $t6, $zero, 
12(0xc)
9000000000327394:       24000a2f        ldptr.w         $t3, $t5, 8(0x8) 

9000000000327398:       2400022d        ldptr.w         $t1, $t5, 0
900000000032739c:       2400062c        ldptr.w         $t0, $t5, 4(0x4)
90000000003273a0:       001011e4        add.w           $a0, $t3, $a0 

90000000003273a4:       001111af        sub.w           $t3, $t1, $a0 

90000000003273a8:       001039ef        add.w           $t3, $t3, $t2 

90000000003273ac:       004cf08e        rotri.w         $t2, $a0, 0x1c
90000000003273b0:       0010418c        add.w           $t0, $t0, $t4
...

with -mstrict-align:
90000000003310c0 <jhash>:
90000000003310c0:       15bd5b6f        lu12i.w         $t3, 
-136485(0xdeadb)
90000000003310c4:       03bbbdef        ori             $t3, $t3, 0xeef
90000000003310c8:       001019ef        add.w           $t3, $t3, $a2
90000000003310cc:       001015e6        add.w           $a2, $t3, $a1
90000000003310d0:       0280300d        addi.w          $t1, $zero, 12(0xc)
90000000003310d4:       0015008c        move            $t0, $a0
90000000003310d8:       001500d2        move            $t6, $a2
90000000003310dc:       001500c4        move            $a0, $a2
90000000003310e0:       6c0101a5        bgeu            $t1, $a1, 
256(0x100)    # 90000000003311e0 <jhash+0x120>
90000000003310e4:       02803011        addi.w          $t5, $zero, 12(0xc)
90000000003310e8:       2a002589        ld.bu           $a5, $t0, 9(0x9)
90000000003310ec:       2a00218d        ld.bu           $t1, $t0, 8(0x8)
90000000003310f0:       2a002988        ld.bu           $a4, $t0, 10(0xa)
90000000003310f4:       2a000587        ld.bu           $a3, $t0, 1(0x1)
90000000003310f8:       2a002d8e        ld.bu           $t2, $t0, 11(0xb)
90000000003310fc:       2a00018b        ld.bu           $a7, $t0, 0
9000000000331100:       2a000994        ld.bu           $t8, $t0, 2(0x2)
9000000000331104:       2a001593        ld.bu           $t7, $t0, 5(0x5)
9000000000331108:       2a000d8f        ld.bu           $t3, $t0, 3(0x3)
900000000033110c:       00412129        slli.d          $a5, $a5, 0x8
9000000000331110:       2a00118a        ld.bu           $a6, $t0, 4(0x4)
9000000000331114:       2a001990        ld.bu           $t4, $t0, 6(0x6)
9000000000331118:       00153529        or              $a5, $a5, $t1
...

It seems that it's difficult for me to test the performance difference 
in a real kernel path with unaligned-access code. So, I use a kernel 
module (use simple test code) to show some difference on 3A5000 as 
following:

c code:

         preempt_disable();
         start = ktime_get_ns();
         for (i = 0; i < n; i++)
                 assign(p1[i], q1[i]);
         end = ktime_get_ns();
         preempt_enable();

         printk("mstrict-align-test took: %lld nsec\n", end - start);

assemble code without -mstrict-align:
0:   260000ac        ldptr.d         $t0, $a1, 0
4:   2700008c        stptr.d         $t0, $a0, 0
8:   4c000020        jirl            $zero, $ra, 0

assemble code with -mstrict-align:
0:   2a0000b3        ld.bu           $t7, $a1, 0
4:   2a0004b2        ld.bu           $t6, $a1, 1(0x1)
8:   2a0008b1        ld.bu           $t5, $a1, 2(0x2)
c:   2a000cb0        ld.bu           $t4, $a1, 3(0x3)
10:   2a0010af        ld.bu           $t3, $a1, 4(0x4)
14:   2a0014ae        ld.bu           $t2, $a1, 5(0x5)
18:   2a0018ad        ld.bu           $t1, $a1, 6(0x6)
1c:   2a001cac        ld.bu           $t0, $a1, 7(0x7)
20:   29000093        st.b            $t7, $a0, 0
24:   29000492        st.b            $t6, $a0, 1(0x1)
28:   29000891        st.b            $t5, $a0, 2(0x2)
2c:   29000c90        st.b            $t4, $a0, 3(0x3)
30:   2900108f        st.b            $t3, $a0, 4(0x4)
34:   2900148e        st.b            $t2, $a0, 5(0x5)
38:   2900188d        st.b            $t1, $a0, 6(0x6)
3c:   29001c8c        st.b            $t0, $a0, 7(0x7)
40:   4c000020        jirl            $zero, $ra, 0

and test results (run 3 times) following:

the module without -mstrict-align testing:
[root@openEuler loongson]# insmod align-test.ko
[   39.029931] mstrict-align-test took: 29603510 nsec
[root@openEuler loongson]# rmmod align-test.ko
[root@openEuler loongson]# insmod align-test.ko
[   41.356007] mstrict-align-test took: 28816710 nsec
[root@openEuler loongson]# rmmod align-test.ko
[root@openEuler loongson]# insmod align-test.ko
[   43.506624] mstrict-align-test took: 30030700 nsec
[root@openEuler loongson]# rmmod align-test.ko

the module with -mstrict-align testing:
root@openEuler ~]# insmod align-test.ko
[   92.656477] mstrict-align-test took: 59629000 nsec
[root@openEuler ~]# rmmod align-test.ko
[root@openEuler ~]# insmod align-test.ko
[   99.473011] mstrict-align-test took: 58972250 nsec
[root@openEuler ~]# rmmod align-test.ko
[root@openEuler ~]# insmod align-test.ko
[  104.620103] mstrict-align-test took: 59419260 nsec
[root@openEuler ~]# rmmod align-test.ko

Thanks!
Jianmin

>>
>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>> ---
>>   arch/loongarch/Kconfig  | 10 ++++++++++
>>   arch/loongarch/Makefile |  2 ++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 9cc8b84f7eb0..7470dcfb32f0 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -441,6 +441,16 @@ config ARCH_IOREMAP
>>         protection support. However, you can enable LoongArch DMW-based
>>         ioremap() for better performance.
>> +config ARCH_STRICT_ALIGN
>> +    bool "Enable -mstrict-align to prevent unaligned accesses"
>> +    help
>> +      Not all LoongArch cores support h/w unaligned access, we can use
>> +      -mstrict-align build parameter to prevent unaligned accesses.
>> +
>> +      This is disabled by default to optimise for performance, you can
>> +      enabled it manually if you want to run kernel on systems without
>> +      h/w unaligned access support.
>> +
>>   config KEXEC
>>       bool "Kexec system call"
>>       select KEXEC_CORE
>> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
>> index 4402387d2755..ccfb52700237 100644
>> --- a/arch/loongarch/Makefile
>> +++ b/arch/loongarch/Makefile
>> @@ -91,10 +91,12 @@ KBUILD_CPPFLAGS += -DVMLINUX_LOAD_ADDRESS=$(load-y)
>>   # instead of .eh_frame so we don't discard them.
>>   KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>> +ifdef CONFIG_ARCH_STRICT_ALIGN
>>   # Don't emit unaligned accesses.
>>   # Not all LoongArch cores support unaligned access, and as kernel we 
>> can't
>>   # rely on others to provide emulation for these accesses.
>>   KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
>> +endif >
>>   KBUILD_CFLAGS += -isystem $(shell $(CC) -print-file-name=include)
> 


  reply	other threads:[~2023-02-06 10:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02  8:42 [PATCH] LoongArch: Make -mstrict-align be configurable Huacai Chen
2023-02-02  9:01 ` David Laight
2023-02-03  2:00   ` Huacai Chen
2023-02-03  8:46     ` David Laight
2023-02-06 10:28       ` Jianmin Lv
2023-02-07  5:24         ` WANG Xuerui
2023-02-07 10:32           ` Arnd Bergmann
2023-02-07 13:28             ` Jianmin Lv
2023-02-07 14:10               ` Arnd Bergmann
2023-02-08 11:17                 ` Huacai Chen
2023-02-02  9:46 ` Arnd Bergmann
2023-02-03  2:08   ` Huacai Chen
2023-02-06 10:33     ` Arnd Bergmann
2023-02-02 10:30 ` WANG Xuerui
2023-02-06 10:24   ` Jianmin Lv [this message]
2023-02-06 11:18     ` Xi Ruoyao
2023-02-06 13:13       ` Jianmin Lv
2023-02-06 13:22         ` Arnd Bergmann
2023-02-07  1:13           ` Jianmin Lv
2023-02-06 13:30         ` Xi Ruoyao
2023-02-07  1:27           ` Jianmin Lv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5303aeda-5c66-ede6-b3ac-7d8ebd73ec70@loongson.cn \
    --to=lvjianmin@loongson.cn \
    --cc=arnd@arndb.de \
    --cc=chenhuacai@kernel.org \
    --cc=chenhuacai@loongson.cn \
    --cc=guoren@kernel.org \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kernel@xen0n.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lixuefeng@loongson.cn \
    --cc=loongarch@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.