Re: [PATCH v2 2/2] ARM: support PHYS_OFFSET minimum aligned at 64KiB boundary

From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Jianguo Chen <chenjianguo3@huawei.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Russell King - ARM Linux admin <linux@armlinux.org.uk>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Libin <huawei.libin@huawei.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	patches-armlinux <patches@armlinux.org.uk>
Subject: Re: [PATCH v2 2/2] ARM: support PHYS_OFFSET minimum aligned at 64KiB boundary
Date: Mon, 28 Sep 2020 17:30:20 +0800	[thread overview]
Message-ID: <265e97cc-1a7f-fbe6-f38b-6aba7d2061bb@huawei.com> (raw)
In-Reply-To: <b6a777af-51f4-07b4-3036-b81e3008b0e8@huawei.com>

On 2020/9/28 9:30, Leizhen (ThunderTown) wrote:
> 
> 
> On 2020/9/22 20:30, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/9/21 16:53, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2020/9/21 14:47, Ard Biesheuvel wrote:
>>>> On Mon, 21 Sep 2020 at 05:35, Leizhen (ThunderTown)
>>>> <thunder.leizhen@huawei.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/9/17 22:00, Ard Biesheuvel wrote:
>>>>>> On Tue, 15 Sep 2020 at 22:06, Russell King - ARM Linux admin
>>>>>> <linux@armlinux.org.uk> wrote:
>>>>>>>
>>>>>>> On Tue, Sep 15, 2020 at 09:16:15PM +0800, Zhen Lei wrote:
>>>>>>>> Currently, only support the kernels where the base of physical memory is
>>>>>>>> at a 16MiB boundary. Because the add/sub instructions only contains 8bits
>>>>>>>> unrotated value. But we can use one more "add/sub" instructions to handle
>>>>>>>> bits 23-16. The performance will be slightly affected.
>>>>>>>>
>>>>>>>> Since most boards meet 16 MiB alignment, so add a new configuration
>>>>>>>> option ARM_PATCH_PHYS_VIRT_RADICAL (default n) to control it. Say Y if
>>>>>>>> anyone really needs it.
>>>>>>>>
>>>>>>>> All r0-r7 (r1 = machine no, r2 = atags or dtb, in the start-up phase) are
>>>>>>>> used in __fixup_a_pv_table() now, but the callee saved r11 is not used in
>>>>>>>> the whole head.S file. So choose it.
>>>>>>>>
>>>>>>>> Because the calculation of "y = x + __pv_offset[63:24]" have been done,
>>>>>>>> so we only need to calculate "y = y + __pv_offset[23:16]", that's why
>>>>>>>> the parameters "to" and "from" of __pv_stub() and __pv_add_carry_stub()
>>>>>>>> in the scope of CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL are all passed "t"
>>>>>>>> (above y).
>>>>>>>>
>>>>>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>>>>>> ---
>>>>>>>>  arch/arm/Kconfig              | 18 +++++++++++++++++-
>>>>>>>>  arch/arm/include/asm/memory.h | 16 +++++++++++++---
>>>>>>>>  arch/arm/kernel/head.S        | 25 +++++++++++++++++++------
>>>>>>>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>>>>>> index e00d94b16658765..19fc2c746e2ce29 100644
>>>>>>>> --- a/arch/arm/Kconfig
>>>>>>>> +++ b/arch/arm/Kconfig
>>>>>>>> @@ -240,12 +240,28 @@ config ARM_PATCH_PHYS_VIRT
>>>>>>>>         kernel in system memory.
>>>>>>>>
>>>>>>>>         This can only be used with non-XIP MMU kernels where the base
>>>>>>>> -       of physical memory is at a 16MB boundary.
>>>>>>>> +       of physical memory is at a 16MiB boundary.
>>>>>>>>
>>>>>>>>         Only disable this option if you know that you do not require
>>>>>>>>         this feature (eg, building a kernel for a single machine) and
>>>>>>>>         you need to shrink the kernel to the minimal size.
>>>>>>>>
>>>>>>>> +config ARM_PATCH_PHYS_VIRT_RADICAL
>>>>>>>> +     bool "Support PHYS_OFFSET minimum aligned at 64KiB boundary"
>>>>>>>> +     default n
>>>>>>>
>>>>>>> Please drop the "default n" - this is the default anyway.
>>>>>>>
>>>>>>>> @@ -236,6 +243,9 @@ static inline unsigned long __phys_to_virt(phys_addr_t x)
>>>>>>>>        * in place where 'r' 32 bit operand is expected.
>>>>>>>>        */
>>>>>>>>       __pv_stub((unsigned long) x, t, "sub", __PV_BITS_31_24);
>>>>>>>> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL
>>>>>>>> +     __pv_stub((unsigned long) t, t, "sub", __PV_BITS_23_16);
>>>>>>>
>>>>>>> t is already unsigned long, so this cast is not necessary.
>>>>>>>
>>>>>>> I've been debating whether it would be better to use "movw" for this
>>>>>>> for ARMv7.  In other words:
>>>>>>>
>>>>>>>         movw    tmp, #16-bit
>>>>>>>         adds    %Q0, %1, tmp, lsl #16
>>>>>>>         adc     %R0, %R0, #0
>>>>>>>
>>>>>>> It would certainly be less instructions, but at the cost of an
>>>>>>> additional register - and we'd have to change the fixup code to
>>>>>>> know about movw.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> Since LPAE implies v7, we can use movw unconditionally, which is nice.
>>>>>>
>>>>>> There is no need to use an additional temp register, as we can use the
>>>>>> register holding the high word. (There is no need for the mov_hi macro
>>>>>> to be separate)
>>>>>>
>>>>>> 0:     movw    %R0, #low offset >> 16
>>>>>>        adds    %Q0, %1, %R0, lsl #16
>>>>>> 1:     mov     %R0, #high offset
>>>>>>        adc     %R0, %R0, #0
>>>>>>        .pushsection .pv_table,"a"
>>>>>>        .long 0b, 1b
>>>>>>        .popsection
>>>>>>
>>>>>> The only problem is distinguishing the two mov instructions from each
>>>>>
>>>>> The #high offset can also consider use movw, it just save two bytes in
>>>>> the thumb2 scenario. We can store different imm16 value for high_offset
>>>>> and low_offset, so that we can distinguish them in __fixup_a_pv_table().
>>>>>
>>>>> This will make the final implementation of the code look more clear and
>>>>> consistent, especially THUMB2.
>>>>>
>>>>> Let me try it.
>>>>>
>>>>
>>>> Hello Zhen Lei,
>>>>
>>>> I am looking into this as well:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm-p2v-v2
>>>>
>>>> Could you please test this version on your hardware?
>>>
>>> OK, I will test it on my boards.
>> Hi Ard Biesheuvel:
>>   I have tested it on 16MiB aligned + LE board, it works well. I've asked my colleagues
>> from other departments to run it on 2MiB aligned + BE board. He will do it tomorrow.
> 
> Hi, Ard Biesheuvel:
>   I'm sorry to keep you waiting so long. You patch series works well on 2MiB aligned + BE board
> also. I spent a lot of time, because our 2MiB aligned + BE board loads zImage. Therefore, special
> processing is required for the following code:
> 
> arch/arm/boot/compressed/head.S:
> #ifdef CONFIG_AUTO_ZRELADDR
>                 mov     r4, pc
>                 and     r4, r4, #0xf8000000		//currently only support 128MiB alignment
>                 add     r4, r4, #TEXT_OFFSET
> #else
> 
> This is a special scenario that does not conflict with your code framework. So I'm trying to fix it.
> 
> Tested-by: Zhen Lei <thunder.leizhen@huawei.com>

Hi, Ard Biesheuvel:
  I just sent the above problem's fix patch.

  [PATCH 0/2] ARM: decompressor: relax the loading restriction of the decompressed kernel

> 
> 
>>
>>
>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> .
>>>