From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43067C433E2 for ; Thu, 17 Sep 2020 06:55:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF27D21974 for ; Thu, 17 Sep 2020 06:55:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600325707; bh=GMQ/3QdbFOIvYxnLWLmz6Pe8a3UFvodMGv9g4DPwcJ8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=t+7i1bo8VaYGfYtjbp7vW1RiTN2pde3rtF+yBw91oL5c15VZcttJQqYfC3CM+KPQg gqMHjECPx4p8W5ZF+k/PeIaXdCoznqFXWdWqRR6iU93Pv0DSP6wjBk7xFKAyzkA5TM oPqGyBKmTb0YXIN7xLkmEGmpf7eTOyV1A4A/a8z0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726185AbgIQGzF (ORCPT ); Thu, 17 Sep 2020 02:55:05 -0400 Received: from mail.kernel.org ([198.145.29.99]:51094 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726106AbgIQGzC (ORCPT ); Thu, 17 Sep 2020 02:55:02 -0400 Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CA19121D41 for ; Thu, 17 Sep 2020 06:55:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600325701; bh=GMQ/3QdbFOIvYxnLWLmz6Pe8a3UFvodMGv9g4DPwcJ8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=I2YuvXzTxwDQG+WJIczrBRaZi/HbC6Sm/gvZPiPdt6BS6hfnYX3tk6WTgLk/jj5W2 OpTmzBoxABU/Umm78dE88WamuSCgpOEa/OYBq9yFa6zeLpfVQHdx8CrfALe1SxDE4y 4jDTK7QbR24Oj6yu0hfdUMZgx6Pep9aQwVUfGQ04= Received: by mail-ot1-f42.google.com with SMTP id y5so950031otg.5 for ; Wed, 16 Sep 2020 23:55:01 -0700 (PDT) X-Gm-Message-State: AOAM533MqT/2ySZDWHkyQeUa3sfqDax3vSqB2nifr1Mqb49sbvbojVTq kvWLDQ+QQ4Shkz2NykEIt+A4aLKhHWtMHafYQzQ= X-Google-Smtp-Source: ABdhPJyC9+TWR8BPikHZcVA0RSBOno9JZTl6MOTNKK2tSSIk+YwEcTF7u84lZcs10D4v1yWuORJYIIIfiTsnpPCBQc0= X-Received: by 2002:a9d:335:: with SMTP id 50mr4533432otv.90.1600325700902; Wed, 16 Sep 2020 23:55:00 -0700 (PDT) MIME-Version: 1.0 References: <20200916024927.3191-1-thunder.leizhen@huawei.com> <20200916024927.3191-3-thunder.leizhen@huawei.com> <5cfc6c64-62f0-1fc8-0783-703ed4254512@huawei.com> In-Reply-To: <5cfc6c64-62f0-1fc8-0783-703ed4254512@huawei.com> From: Ard Biesheuvel Date: Thu, 17 Sep 2020 09:54:49 +0300 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3 2/2] ARM: support PHYS_OFFSET minimum aligned at 64KiB boundary To: "Leizhen (ThunderTown)" Cc: Arnd Bergmann , Nicolas Pitre , Daniel Lezcano , Thomas Gleixner , Andrew Morton , Russell King , Catalin Marinas , linux-arm-kernel , linux-kernel , Jianguo Chen , Kefeng Wang , Libin Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 17 Sep 2020 at 05:16, Leizhen (ThunderTown) wrote: > > > > On 2020/9/16 19:15, Ard Biesheuvel wrote: > > (+ Arnd, Nico) > > > > On Wed, 16 Sep 2020 at 05:51, Zhen Lei wrote: > >> > >> Currently, only support the kernels where the base of physical memory is > >> at a 16MiB boundary. Because the add/sub instructions only contains 8bits > >> unrotated value. But we can use one more "add/sub" instructions to handle > >> bits 23-16. The performance will be slightly affected. > >> > >> Since most boards meet 16 MiB alignment, so add a new configuration > >> option ARM_PATCH_PHYS_VIRT_RADICAL (default n) to control it. Say Y if > >> anyone really needs it. > >> > >> All r0-r7 (r1 = machine no, r2 = atags or dtb, in the start-up phase) are > >> used in __fixup_a_pv_table() now, but the callee saved r11 is not used in > >> the whole head.S file. So choose it. > >> > >> Because the calculation of "y = x + __pv_offset[63:24]" have been done, > >> so we only need to calculate "y = y + __pv_offset[23:16]", that's why > >> the parameters "to" and "from" of __pv_stub() and __pv_add_carry_stub() > >> in the scope of CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL are all passed "t" > >> (above y). > >> > >> Signed-off-by: Zhen Lei > >> --- > >> arch/arm/Kconfig | 17 ++++++++++++++++- > >> arch/arm/include/asm/memory.h | 16 +++++++++++++--- > >> arch/arm/kernel/head.S | 25 +++++++++++++++++++------ > >> 3 files changed, 48 insertions(+), 10 deletions(-) > >> > >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > >> index e00d94b16658765..073dafa428f3c87 100644 > >> --- a/arch/arm/Kconfig > >> +++ b/arch/arm/Kconfig > >> @@ -240,12 +240,27 @@ config ARM_PATCH_PHYS_VIRT > >> kernel in system memory. > >> > >> This can only be used with non-XIP MMU kernels where the base > >> - of physical memory is at a 16MB boundary. > >> + of physical memory is at a 16MiB boundary. > >> > >> Only disable this option if you know that you do not require > >> this feature (eg, building a kernel for a single machine) and > >> you need to shrink the kernel to the minimal size. > >> > >> +config ARM_PATCH_PHYS_VIRT_RADICAL > >> + bool "Support PHYS_OFFSET minimum aligned at 64KiB boundary" > >> + depends on ARM_PATCH_PHYS_VIRT > >> + depends on !THUMB2_KERNEL > > > > Why is this not implemented for Thumb2 too? > > No Thumb2 boards. > Are you saying none of the boards you tested this on have support for the Thumb2 instruction set?? That seems unlikely to me, given that all ARM cores since Cortex-A8 can run this code, but even if that were true, you can easily test this on QEMU as well, which can definitely run the Thumb2 kernel. > > > > Also, as Russell points out as well, this may end up being enabled for > > all multiarch kernels, so it makes sense to explore whether we can > > enable this unconditionally. > > Yes, In fact, I think we can consider enabling this unconditionally after > the THUMB2 branch is implemented. Performance and code size should not be > a problem. > Excellent. > > Do you have any numbers wrt the impact on > > text size? I would assume it is negligible, but numbers help. > > The text size increased a bit more than 2 KB (2164 Bytes), about 0.0146%. > > make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- distclean defconfig > > Before: > $ size vmlinux > text data bss dec hex filename > 14781964 7508366 420080 22710410 15a888a vmlinux > > After: > $ size vmlinux > text data bss dec hex filename > 14784128 7508366 420080 22712574 15a90fe vmlinux > > OK, so negligible in terms of code size. > > > > Being able to decompress the image to any 2MiB aligned base address is > > also quite useful for EFI boot, and it may also help to get rid of the > > TEXT_OFFSET hacks we have for some platforms in the future.> > > > >> + help > >> + This can only be used with non-XIP MMU kernels where the base > >> + of physical memory is at a 64KiB boundary. > >> + > >> + Compared with ARM_PATCH_PHYS_VIRT, one or two more instructions > >> + need to be added to implement the conversion of bits 23-16 of > >> + the VA/PA in phys-to-virt and virt-to-phys. The performance is > >> + slightly affected. > >> + > > > > Does it affect performance in other ways beyond code size/Icache density? > > I just want to say it will slightly slower than !ARM_PATCH_PHYS_VIRT_RADICAL, > because one or two more instructions. It certainly cannot affect system performance. > > Because of your doubts, I think I should remove the statement: "The performance is > slightly affected." > Yeah, I think you can drop that statement. > > > >> + If unsure say N here. > >> + > >> config NEED_MACH_IO_H > >> bool > >> help > >> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h > >> index 99035b5891ef442..f97b37303a00f60 100644 > >> --- a/arch/arm/include/asm/memory.h > >> +++ b/arch/arm/include/asm/memory.h > >> @@ -173,6 +173,7 @@ > >> * so that all we need to do is modify the 8-bit constant field. > >> */ > >> #define __PV_BITS_31_24 0x81000000 > >> +#define __PV_BITS_23_16 0x00810000 > >> #define __PV_BITS_7_0 0x81 > >> > >> extern unsigned long __pv_phys_pfn_offset; > >> @@ -201,7 +202,7 @@ > >> : "=r" (t) \ > >> : "I" (__PV_BITS_7_0)) > >> > >> -#define __pv_add_carry_stub(x, y) \ > >> +#define __pv_add_carry_stub(x, y, type) \ > >> __asm__ volatile("@ __pv_add_carry_stub\n" \ > >> "1: adds %Q0, %1, %2\n" \ > >> " adc %R0, %R0, #0\n" \ > >> @@ -209,7 +210,7 @@ > >> " .long 1b\n" \ > >> " .popsection\n" \ > >> : "+r" (y) \ > >> - : "r" (x), "I" (__PV_BITS_31_24) \ > >> + : "r" (x), "I" (type) \ > >> : "cc") > >> > >> static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x) > >> @@ -218,9 +219,15 @@ static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x) > >> > >> if (sizeof(phys_addr_t) == 4) { > >> __pv_stub(x, t, "add", __PV_BITS_31_24); > >> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL > >> + __pv_stub(t, t, "add", __PV_BITS_23_16); > >> +#endif > >> } else { > >> __pv_stub_mov_hi(t); > >> - __pv_add_carry_stub(x, t); > >> + __pv_add_carry_stub(x, t, __PV_BITS_31_24); > >> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL > >> + __pv_add_carry_stub(t, t, __PV_BITS_23_16); > >> +#endif > >> } > >> return t; > >> } > >> @@ -236,6 +243,9 @@ static inline unsigned long __phys_to_virt(phys_addr_t x) > >> * in place where 'r' 32 bit operand is expected. > >> */ > >> __pv_stub((unsigned long) x, t, "sub", __PV_BITS_31_24); > >> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL > >> + __pv_stub(t, t, "sub", __PV_BITS_23_16); > >> +#endif > >> return t; > >> } > >> > >> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S > >> index 02d78c9198d0e8d..d9fb226a24d43ae 100644 > >> --- a/arch/arm/kernel/head.S > >> +++ b/arch/arm/kernel/head.S > >> @@ -120,7 +120,7 @@ ENTRY(stext) > >> bl __fixup_smp > >> #endif > >> #ifdef CONFIG_ARM_PATCH_PHYS_VIRT > >> - bl __fixup_pv_table > >> + bl __fixup_pv_table @r11 will be used > >> #endif > >> bl __create_page_tables > >> > >> @@ -614,8 +614,13 @@ __fixup_pv_table: > >> mov r0, r8, lsr #PAGE_SHIFT @ convert to PFN > >> str r0, [r6] @ save computed PHYS_PFN_OFFSET to __pv_phys_pfn_offset > >> strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits > >> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL > >> + mov r6, r3, lsr #16 @ constant for add/sub instructions > >> + teq r3, r6, lsl #16 @ must be 64KiB aligned > >> +#else > >> mov r6, r3, lsr #24 @ constant for add/sub instructions > >> teq r3, r6, lsl #24 @ must be 16MiB aligned > >> +#endif > >> THUMB( it ne @ cross section branch ) > >> bne __error > >> str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits > >> @@ -636,7 +641,9 @@ __fixup_a_pv_table: > >> add r6, r6, r3 > >> ldr r0, [r6, #HIGH_OFFSET] @ __pv_offset high word > >> ldr r6, [r6, #LOW_OFFSET] @ __pv_offset low word > >> - mov r6, r6, lsr #24 > >> + mov r11, r6, lsl #8 > >> + mov r11, r11, lsr #24 @ bits 23-16 > >> + mov r6, r6, lsr #24 @ bits 31-24 > >> cmn r0, #1 > >> #ifdef CONFIG_THUMB2_KERNEL > >> moveq r0, #0x200000 @ set bit 21, mov to mvn instruction > >> @@ -682,14 +689,20 @@ ARM_BE8(rev16 ip, ip) > >> #ifdef CONFIG_CPU_ENDIAN_BE8 > >> @ in BE8, we load data in BE, but instructions still in LE > >> bic ip, ip, #0xff000000 > >> - tst ip, #0x000f0000 @ check the rotation field > >> + tst ip, #0x00040000 @ check the rotation field > >> orrne ip, ip, r6, lsl #24 @ mask in offset bits 31-24 > >> + tst ip, #0x00080000 @ check the rotation field > >> + orrne ip, ip, r11, lsl #24 @ mask in offset bits 23-16 > >> + tst ip, #0x000f0000 @ check the rotation field > >> biceq ip, ip, #0x00004000 @ clear bit 22 > >> orreq ip, ip, r0 @ mask in offset bits 7-0 > >> #else > >> bic ip, ip, #0x000000ff > >> - tst ip, #0xf00 @ check the rotation field > >> + tst ip, #0x400 @ check the rotation field > >> orrne ip, ip, r6 @ mask in offset bits 31-24 > >> + tst ip, #0x800 @ check the rotation field > >> + orrne ip, ip, r11 @ mask in offset bits 23-16 > >> + tst ip, #0xf00 @ check the rotation field > >> biceq ip, ip, #0x400000 @ clear bit 22 > >> orreq ip, ip, r0 @ mask in offset bits 7-0 > >> #endif > >> @@ -705,12 +718,12 @@ ENDPROC(__fixup_a_pv_table) > >> 3: .long __pv_offset > >> > >> ENTRY(fixup_pv_table) > >> - stmfd sp!, {r4 - r7, lr} > >> + stmfd sp!, {r4 - r7, r11, lr} > >> mov r3, #0 @ no offset > >> mov r4, r0 @ r0 = table start > >> add r5, r0, r1 @ r1 = table size > >> bl __fixup_a_pv_table > >> - ldmfd sp!, {r4 - r7, pc} > >> + ldmfd sp!, {r4 - r7, r11, pc} > >> ENDPROC(fixup_pv_table) > >> > >> .data > >> -- > >> 1.8.3 > >> > >> > >> > >> _______________________________________________ > >> linux-arm-kernel mailing list > >> linux-arm-kernel@lists.infradead.org > >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > . > > >