All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
@ 2020-09-02 11:15 Michal Simek
  2020-09-02 14:43 ` André Przywara
  2020-09-02 16:34 ` Stephen Warren
  0 siblings, 2 replies; 18+ messages in thread
From: Michal Simek @ 2020-09-02 11:15 UTC (permalink / raw)
  To: u-boot

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
compilation error is shown:
/mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
defined in .bss_start section in u-boot.

It is caused by adr instruction which permits the calculation of any byte
address within +- 1MB of the current PC.
Because U-Boot is bigger then 1MB calculation is failing.

The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
immediate left by 12 bits (4k page), adds it to the value of the program
counter with the bottom 12 bits cleared to zero. Then add instruction
provides the lower 12 bits which is offset within 4k page.
These two instructions together compose full 32bit offset which should be
more then enough to cover the whole u-boot size.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
---

 arch/arm/cpu/armv8/start.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
index 002698b501c3..52cf1230c205 100644
--- a/arch/arm/cpu/armv8/start.S
+++ b/arch/arm/cpu/armv8/start.S
@@ -67,8 +67,10 @@ pie_fixup:
 	adr	x0, _start		/* x0 <- Runtime value of _start */
 	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
 	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
-	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
-	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
+	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
+	add     x2, x2, #:lo12:__rel_dyn_start
+	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
+	add     x3, x3, #:lo12:__rel_dyn_end
 pie_fix_loop:
 	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
 	ldr	x4, [x2], #8		/* x4 <- addend */
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 11:15 [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y Michal Simek
@ 2020-09-02 14:43 ` André Przywara
  2020-09-02 14:51   ` Michal Simek
  2020-09-02 14:53   ` Edgar E. Iglesias
  2020-09-02 16:34 ` Stephen Warren
  1 sibling, 2 replies; 18+ messages in thread
From: André Przywara @ 2020-09-02 14:43 UTC (permalink / raw)
  To: u-boot

On 02/09/2020 12:15, Michal Simek wrote:

Hi,

> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> 
> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> compilation error is shown:
> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> defined in .bss_start section in u-boot.
> 
> It is caused by adr instruction which permits the calculation of any byte
> address within +- 1MB of the current PC.
> Because U-Boot is bigger then 1MB calculation is failing.
> 
> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> immediate left by 12 bits (4k page), adds it to the value of the program
> counter with the bottom 12 bits cleared to zero. Then add instruction
> provides the lower 12 bits which is offset within 4k page.
> These two instructions together compose full 32bit offset which should be
> more then enough to cover the whole u-boot size.
> 
> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> Signed-off-by: Michal Simek <michal.simek@xilinx.com>

It's a bit scary that you need more than 1MB, but indeed what you do
below is the canonical pattern to get the full range of PC relative
addressing (this is used heavily in Trusted Firmware, for instance).

The only thing to keep in mind is that this assumes that the load
address of the binary is 4K aligned, so that the low 12 bits of the
symbol stay the same. I wonder if we should enforce this somehow? But
the load address is not controlled by the build process (the whole
purpose of PIE), so that's not doable just in the build system?

Shall we at least document this? I guess typical load address are
actually quite well aligned, so it might not be an issue in practice.

Cheers,
Andre

> ---
> 
>  arch/arm/cpu/armv8/start.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> index 002698b501c3..52cf1230c205 100644
> --- a/arch/arm/cpu/armv8/start.S
> +++ b/arch/arm/cpu/armv8/start.S
> @@ -67,8 +67,10 @@ pie_fixup:
>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
> +	add     x2, x2, #:lo12:__rel_dyn_start
> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
> +	add     x3, x3, #:lo12:__rel_dyn_end
>  pie_fix_loop:
>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>  	ldr	x4, [x2], #8		/* x4 <- addend */
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 14:43 ` André Przywara
@ 2020-09-02 14:51   ` Michal Simek
  2020-09-02 14:53   ` Edgar E. Iglesias
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Simek @ 2020-09-02 14:51 UTC (permalink / raw)
  To: u-boot

Hi,

On 02. 09. 20 16:43, Andr? Przywara wrote:
> On 02/09/2020 12:15, Michal Simek wrote:
> 
> Hi,
> 
>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>
>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>> compilation error is shown:
>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>> defined in .bss_start section in u-boot.
>>
>> It is caused by adr instruction which permits the calculation of any byte
>> address within +- 1MB of the current PC.
>> Because U-Boot is bigger then 1MB calculation is failing.
>>
>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>> immediate left by 12 bits (4k page), adds it to the value of the program
>> counter with the bottom 12 bits cleared to zero. Then add instruction
>> provides the lower 12 bits which is offset within 4k page.
>> These two instructions together compose full 32bit offset which should be
>> more then enough to cover the whole u-boot size.
>>
>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> 
> It's a bit scary that you need more than 1MB, but indeed what you do
> below is the canonical pattern to get the full range of PC relative
> addressing (this is used heavily in Trusted Firmware, for instance).

We have generic u-boot where we are enabling all features which you can
use and it is up to everybody to disable what they don't need. We are at
1MB limit that's why I spot this issue and trying to solve it.

> 
> The only thing to keep in mind is that this assumes that the load
> address of the binary is 4K aligned, so that the low 12 bits of the
> symbol stay the same. I wonder if we should enforce this somehow? But
> the load address is not controlled by the build process (the whole
> purpose of PIE), so that's not doable just in the build system?

That's more question to you if we can enforce is somehow. :-)


> Shall we at least document this? I guess typical load address are
> actually quite well aligned, so it might not be an issue in practice.

maybe note this in Kconfig help for POSITION_INDEPENDENT?
Any other location?

Thanks,
Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 14:43 ` André Przywara
  2020-09-02 14:51   ` Michal Simek
@ 2020-09-02 14:53   ` Edgar E. Iglesias
  2020-09-02 15:18     ` André Przywara
  1 sibling, 1 reply; 18+ messages in thread
From: Edgar E. Iglesias @ 2020-09-02 14:53 UTC (permalink / raw)
  To: u-boot

On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr? Przywara wrote:
> On 02/09/2020 12:15, Michal Simek wrote:
> 
> Hi,

Hi Andre,


> 
> > From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> > 
> > When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> > compilation error is shown:
> > /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> > truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> > defined in .bss_start section in u-boot.
> > 
> > It is caused by adr instruction which permits the calculation of any byte
> > address within +- 1MB of the current PC.
> > Because U-Boot is bigger then 1MB calculation is failing.
> > 
> > The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> > immediate left by 12 bits (4k page), adds it to the value of the program
> > counter with the bottom 12 bits cleared to zero. Then add instruction
> > provides the lower 12 bits which is offset within 4k page.
> > These two instructions together compose full 32bit offset which should be
> > more then enough to cover the whole u-boot size.
> > 
> > Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> > Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> 
> It's a bit scary that you need more than 1MB, but indeed what you do
> below is the canonical pattern to get the full range of PC relative
> addressing (this is used heavily in Trusted Firmware, for instance).
> 
> The only thing to keep in mind is that this assumes that the load
> address of the binary is 4K aligned, so that the low 12 bits of the
> symbol stay the same. I wonder if we should enforce this somehow? But
> the load address is not controlled by the build process (the whole
> purpose of PIE), so that's not doable just in the build system?

There shouldn't be any need for 4K alignment. Could you elaborate on
why you think there is?

Perhaps the commit message is a little confusing. The toolchain will
compute the pc-relative offset from this particular location to the
symbol and apply the relocations accordingly.

Best regards,
Edgar



> 
> Shall we at least document this? I guess typical load address are
> actually quite well aligned, so it might not be an issue in practice.
> 
> Cheers,
> Andre
> 
> > ---
> > 
> >  arch/arm/cpu/armv8/start.S | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> > index 002698b501c3..52cf1230c205 100644
> > --- a/arch/arm/cpu/armv8/start.S
> > +++ b/arch/arm/cpu/armv8/start.S
> > @@ -67,8 +67,10 @@ pie_fixup:
> >  	adr	x0, _start		/* x0 <- Runtime value of _start */
> >  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
> >  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
> > -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
> > -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
> > +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
> > +	add     x2, x2, #:lo12:__rel_dyn_start
> > +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
> > +	add     x3, x3, #:lo12:__rel_dyn_end
> >  pie_fix_loop:
> >  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
> >  	ldr	x4, [x2], #8		/* x4 <- addend */
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 14:53   ` Edgar E. Iglesias
@ 2020-09-02 15:18     ` André Przywara
  2020-09-02 15:25       ` Edgar E. Iglesias
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2020-09-02 15:18 UTC (permalink / raw)
  To: u-boot

On 02/09/2020 15:53, Edgar E. Iglesias wrote:
> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
>> On 02/09/2020 12:15, Michal Simek wrote:

Hi,

>>
>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>
>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>> compilation error is shown:
>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>> defined in .bss_start section in u-boot.
>>>
>>> It is caused by adr instruction which permits the calculation of any byte
>>> address within +- 1MB of the current PC.
>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>
>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>> provides the lower 12 bits which is offset within 4k page.
>>> These two instructions together compose full 32bit offset which should be
>>> more then enough to cover the whole u-boot size.
>>>
>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>>
>> It's a bit scary that you need more than 1MB, but indeed what you do
>> below is the canonical pattern to get the full range of PC relative
>> addressing (this is used heavily in Trusted Firmware, for instance).
>>
>> The only thing to keep in mind is that this assumes that the load
>> address of the binary is 4K aligned, so that the low 12 bits of the
>> symbol stay the same. I wonder if we should enforce this somehow? But
>> the load address is not controlled by the build process (the whole
>> purpose of PIE), so that's not doable just in the build system?
> 
> There shouldn't be any need for 4K alignment. Could you elaborate on
> why you think there is?

That seems to be slightly tricky, and I tried to get some confirmation,
but here goes my reasoning. Maybe you can confirm this:

- adrp takes the relative offset, but only of the upper 20 bits (because
that's all we can encode). It clears the lower 12 bits of the register.
- the "add" is not PC relative anymore, so it just takes the lower 12
bits of the "absolute" linker symbol.
So this assumes that the lower 12 bits of the actual address in memory
and the lower 12 bits of the linker's view match.
An example:
00024: adrp x0, SYMBOL
00028: add  x0, x0, :lo12:SYMBOL

SYMBOL:
42058: ...

The toolchain will generate:
	adrp x0, #0x42; add x0, x0, #0x058

Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
0x80042858.
The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
The add will just add 0x58, so you end up with x0 being 0x80042058,
which is not the right address.

Does this make sense?

> Perhaps the commit message is a little confusing. The toolchain will
> compute the pc-relative offset from this particular location to the
> symbol and apply the relocations accordingly.

Yes, but the PC relative offset applies only to the upper 20 bits,
because it's only adrp that has PC relative semantics.

Cheers,
Andre

>>
>> Shall we at least document this? I guess typical load address are
>> actually quite well aligned, so it might not be an issue in practice.
>>
>> Cheers,
>> Andre
>>
>>> ---
>>>
>>>  arch/arm/cpu/armv8/start.S | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
>>> index 002698b501c3..52cf1230c205 100644
>>> --- a/arch/arm/cpu/armv8/start.S
>>> +++ b/arch/arm/cpu/armv8/start.S
>>> @@ -67,8 +67,10 @@ pie_fixup:
>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
>>> +	add     x2, x2, #:lo12:__rel_dyn_start
>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
>>> +	add     x3, x3, #:lo12:__rel_dyn_end
>>>  pie_fix_loop:
>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
>>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 15:18     ` André Przywara
@ 2020-09-02 15:25       ` Edgar E. Iglesias
  2020-09-02 18:59         ` André Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Edgar E. Iglesias @ 2020-09-02 15:25 UTC (permalink / raw)
  To: u-boot

On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr? Przywara wrote:
> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
> > On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
> >> On 02/09/2020 12:15, Michal Simek wrote:
> 
> Hi,
> 
> >>
> >>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> >>>
> >>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> >>> compilation error is shown:
> >>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> >>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> >>> defined in .bss_start section in u-boot.
> >>>
> >>> It is caused by adr instruction which permits the calculation of any byte
> >>> address within +- 1MB of the current PC.
> >>> Because U-Boot is bigger then 1MB calculation is failing.
> >>>
> >>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> >>> immediate left by 12 bits (4k page), adds it to the value of the program
> >>> counter with the bottom 12 bits cleared to zero. Then add instruction
> >>> provides the lower 12 bits which is offset within 4k page.
> >>> These two instructions together compose full 32bit offset which should be
> >>> more then enough to cover the whole u-boot size.
> >>>
> >>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> >>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> >>
> >> It's a bit scary that you need more than 1MB, but indeed what you do
> >> below is the canonical pattern to get the full range of PC relative
> >> addressing (this is used heavily in Trusted Firmware, for instance).
> >>
> >> The only thing to keep in mind is that this assumes that the load
> >> address of the binary is 4K aligned, so that the low 12 bits of the
> >> symbol stay the same. I wonder if we should enforce this somehow? But
> >> the load address is not controlled by the build process (the whole
> >> purpose of PIE), so that's not doable just in the build system?
> > 
> > There shouldn't be any need for 4K alignment. Could you elaborate on
> > why you think there is?
> 
> That seems to be slightly tricky, and I tried to get some confirmation,
> but here goes my reasoning. Maybe you can confirm this:
> 
> - adrp takes the relative offset, but only of the upper 20 bits (because
> that's all we can encode). It clears the lower 12 bits of the register.
> - the "add" is not PC relative anymore, so it just takes the lower 12
> bits of the "absolute" linker symbol.

I was under the impression that this would use a PC-relative lower 12bit
relocation but you are correct. I dissasembled the result:

  40:   91000042        add     x2, x2, #0x0
                        40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start





> So this assumes that the lower 12 bits of the actual address in memory
> and the lower 12 bits of the linker's view match.
> An example:
> 00024: adrp x0, SYMBOL
> 00028: add  x0, x0, :lo12:SYMBOL
> 
> SYMBOL:
> 42058: ...
> 
> The toolchain will generate:
> 	adrp x0, #0x42; add x0, x0, #0x058
> 
> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
> 0x80042858.
> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
> The add will just add 0x58, so you end up with x0 being 0x80042058,
> which is not the right address.
> 
> Does this make sense?


Yes, it makes sense.

> 
> > Perhaps the commit message is a little confusing. The toolchain will
> > compute the pc-relative offset from this particular location to the
> > symbol and apply the relocations accordingly.
> 
> Yes, but the PC relative offset applies only to the upper 20 bits,
> because it's only adrp that has PC relative semantics.
> 
> 
> >>
> >> Shall we at least document this? I guess typical load address are
> >> actually quite well aligned, so it might not be an issue in practice.
> >>

Yes, probably worth documenting and perhaps an early bail-out if it's not
the case...

Thanks,
Edgar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 11:15 [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y Michal Simek
  2020-09-02 14:43 ` André Przywara
@ 2020-09-02 16:34 ` Stephen Warren
  2020-09-03 13:35   ` Michal Simek
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen Warren @ 2020-09-02 16:34 UTC (permalink / raw)
  To: u-boot

On 9/2/20 5:15 AM, Michal Simek wrote:
> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> 
> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> compilation error is shown:
> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> defined in .bss_start section in u-boot.
> 
> It is caused by adr instruction which permits the calculation of any byte
> address within +- 1MB of the current PC.
> Because U-Boot is bigger then 1MB calculation is failing.
> 
> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> immediate left by 12 bits (4k page), adds it to the value of the program
> counter with the bottom 12 bits cleared to zero. Then add instruction
> provides the lower 12 bits which is offset within 4k page.
> These two instructions together compose full 32bit offset which should be
> more then enough to cover the whole u-boot size.

> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S

> @@ -67,8 +67,10 @@ pie_fixup:
>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
> +	add     x2, x2, #:lo12:__rel_dyn_start
> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
> +	add     x3, x3, #:lo12:__rel_dyn_end
>  pie_fix_loop:
>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>  	ldr	x4, [x2], #8		/* x4 <- addend */

There are likely a bunch of other places in the code that need updating
too; take a look at commit 49e93875a62f "arm64: support running@addr
other than linked to" (which introduced the code above) to find other
places with similar instruction sequences that almost certainly need
updating.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 15:25       ` Edgar E. Iglesias
@ 2020-09-02 18:59         ` André Przywara
  2020-09-03 13:41           ` Michal Simek
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2020-09-02 18:59 UTC (permalink / raw)
  To: u-boot

On 02/09/2020 16:25, Edgar E. Iglesias wrote:
> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote:
>> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
>>>> On 02/09/2020 12:15, Michal Simek wrote:
>>
>> Hi,
>>
>>>>
>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>
>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>>> compilation error is shown:
>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>>> defined in .bss_start section in u-boot.
>>>>>
>>>>> It is caused by adr instruction which permits the calculation of any byte
>>>>> address within +- 1MB of the current PC.
>>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>>
>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>>> provides the lower 12 bits which is offset within 4k page.
>>>>> These two instructions together compose full 32bit offset which should be
>>>>> more then enough to cover the whole u-boot size.
>>>>>
>>>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>>>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>>>>
>>>> It's a bit scary that you need more than 1MB, but indeed what you do
>>>> below is the canonical pattern to get the full range of PC relative
>>>> addressing (this is used heavily in Trusted Firmware, for instance).
>>>>
>>>> The only thing to keep in mind is that this assumes that the load
>>>> address of the binary is 4K aligned, so that the low 12 bits of the
>>>> symbol stay the same. I wonder if we should enforce this somehow? But
>>>> the load address is not controlled by the build process (the whole
>>>> purpose of PIE), so that's not doable just in the build system?
>>>
>>> There shouldn't be any need for 4K alignment. Could you elaborate on
>>> why you think there is?
>>
>> That seems to be slightly tricky, and I tried to get some confirmation,
>> but here goes my reasoning. Maybe you can confirm this:
>>
>> - adrp takes the relative offset, but only of the upper 20 bits (because
>> that's all we can encode). It clears the lower 12 bits of the register.
>> - the "add" is not PC relative anymore, so it just takes the lower 12
>> bits of the "absolute" linker symbol.
> 
> I was under the impression that this would use a PC-relative lower 12bit
> relocation but you are correct. I dissasembled the result:
> 
>   40:   91000042        add     x2, x2, #0x0
>                         40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start
> 
> 
> 
> 
> 
>> So this assumes that the lower 12 bits of the actual address in memory
>> and the lower 12 bits of the linker's view match.
>> An example:
>> 00024: adrp x0, SYMBOL
>> 00028: add  x0, x0, :lo12:SYMBOL
>>
>> SYMBOL:
>> 42058: ...
>>
>> The toolchain will generate:
>> 	adrp x0, #0x42; add x0, x0, #0x058
>>
>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
>> 0x80042858.
>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
>> The add will just add 0x58, so you end up with x0 being 0x80042058,
>> which is not the right address.
>>
>> Does this make sense?
> 
> 
> Yes, it makes sense.
> 
>>
>>> Perhaps the commit message is a little confusing. The toolchain will
>>> compute the pc-relative offset from this particular location to the
>>> symbol and apply the relocations accordingly.
>>
>> Yes, but the PC relative offset applies only to the upper 20 bits,
>> because it's only adrp that has PC relative semantics.
>>
>>
>>>>
>>>> Shall we at least document this? I guess typical load address are
>>>> actually quite well aligned, so it might not be an issue in practice.
>>>>
> 
> Yes, probably worth documenting and perhaps an early bail-out if it's not
> the case...

Documenting sounds good, Kconfig might be a good place, as Michal suggested.

Bail out: I thought about that, it's very easy to detect at runtime, but
what then? This is really early, so you could just enter a WFI loop, and
hope for someone to connect the dots?
Or can you think of any other way of communicating with the user?

Cheers,
Andre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 16:34 ` Stephen Warren
@ 2020-09-03 13:35   ` Michal Simek
  2020-09-03 13:40     ` André Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Simek @ 2020-09-03 13:35 UTC (permalink / raw)
  To: u-boot



On 02. 09. 20 18:34, Stephen Warren wrote:
> On 9/2/20 5:15 AM, Michal Simek wrote:
>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>
>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>> compilation error is shown:
>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>> defined in .bss_start section in u-boot.
>>
>> It is caused by adr instruction which permits the calculation of any byte
>> address within +- 1MB of the current PC.
>> Because U-Boot is bigger then 1MB calculation is failing.
>>
>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>> immediate left by 12 bits (4k page), adds it to the value of the program
>> counter with the bottom 12 bits cleared to zero. Then add instruction
>> provides the lower 12 bits which is offset within 4k page.
>> These two instructions together compose full 32bit offset which should be
>> more then enough to cover the whole u-boot size.
> 
>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> 
>> @@ -67,8 +67,10 @@ pie_fixup:
>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
>> +	add     x2, x2, #:lo12:__rel_dyn_start
>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
>> +	add     x3, x3, #:lo12:__rel_dyn_end
>>  pie_fix_loop:
>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>>  	ldr	x4, [x2], #8		/* x4 <- addend */
> 
> There are likely a bunch of other places in the code that need updating
> too; take a look at commit 49e93875a62f "arm64: support running at addr
> other than linked to" (which introduced the code above) to find other
> places with similar instruction sequences that almost certainly need
> updating.
> 

I didn't hit any issue to have a need to update them. Definitely worth
to check that locations too.

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 13:35   ` Michal Simek
@ 2020-09-03 13:40     ` André Przywara
  2020-09-03 15:45       ` Stephen Warren
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2020-09-03 13:40 UTC (permalink / raw)
  To: u-boot

On 03/09/2020 14:35, Michal Simek wrote:
> 
> 
> On 02. 09. 20 18:34, Stephen Warren wrote:
>> On 9/2/20 5:15 AM, Michal Simek wrote:
>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>
>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>> compilation error is shown:
>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>> defined in .bss_start section in u-boot.
>>>
>>> It is caused by adr instruction which permits the calculation of any byte
>>> address within +- 1MB of the current PC.
>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>
>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>> provides the lower 12 bits which is offset within 4k page.
>>> These two instructions together compose full 32bit offset which should be
>>> more then enough to cover the whole u-boot size.
>>
>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
>>
>>> @@ -67,8 +67,10 @@ pie_fixup:
>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
>>> +	add     x2, x2, #:lo12:__rel_dyn_start
>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
>>> +	add     x3, x3, #:lo12:__rel_dyn_end
>>>  pie_fix_loop:
>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
>>
>> There are likely a bunch of other places in the code that need updating
>> too; take a look at commit 49e93875a62f "arm64: support running at addr
>> other than linked to" (which introduced the code above) to find other
>> places with similar instruction sequences that almost certainly need
>> updating.
>>
> 
> I didn't hit any issue to have a need to update them. Definitely worth
> to check that locations too.

So I thought the same, so I checked at least this file. And while there
are other "adr" instructions, they only go to nearby labels, so don't
need to be pumped up.
But I will try to grep for more cases as well.

Cheers,
Andre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-02 18:59         ` André Przywara
@ 2020-09-03 13:41           ` Michal Simek
  2020-09-03 13:52             ` André Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Simek @ 2020-09-03 13:41 UTC (permalink / raw)
  To: u-boot



On 02. 09. 20 20:59, Andr? Przywara wrote:
> On 02/09/2020 16:25, Edgar E. Iglesias wrote:
>> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote:
>>> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
>>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
>>>>> On 02/09/2020 12:15, Michal Simek wrote:
>>>
>>> Hi,
>>>
>>>>>
>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>>
>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>>>> compilation error is shown:
>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>>>> defined in .bss_start section in u-boot.
>>>>>>
>>>>>> It is caused by adr instruction which permits the calculation of any byte
>>>>>> address within +- 1MB of the current PC.
>>>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>>>
>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>>>> provides the lower 12 bits which is offset within 4k page.
>>>>>> These two instructions together compose full 32bit offset which should be
>>>>>> more then enough to cover the whole u-boot size.
>>>>>>
>>>>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>>>>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>>>>>
>>>>> It's a bit scary that you need more than 1MB, but indeed what you do
>>>>> below is the canonical pattern to get the full range of PC relative
>>>>> addressing (this is used heavily in Trusted Firmware, for instance).
>>>>>
>>>>> The only thing to keep in mind is that this assumes that the load
>>>>> address of the binary is 4K aligned, so that the low 12 bits of the
>>>>> symbol stay the same. I wonder if we should enforce this somehow? But
>>>>> the load address is not controlled by the build process (the whole
>>>>> purpose of PIE), so that's not doable just in the build system?
>>>>
>>>> There shouldn't be any need for 4K alignment. Could you elaborate on
>>>> why you think there is?
>>>
>>> That seems to be slightly tricky, and I tried to get some confirmation,
>>> but here goes my reasoning. Maybe you can confirm this:
>>>
>>> - adrp takes the relative offset, but only of the upper 20 bits (because
>>> that's all we can encode). It clears the lower 12 bits of the register.
>>> - the "add" is not PC relative anymore, so it just takes the lower 12
>>> bits of the "absolute" linker symbol.
>>
>> I was under the impression that this would use a PC-relative lower 12bit
>> relocation but you are correct. I dissasembled the result:
>>
>>   40:   91000042        add     x2, x2, #0x0
>>                         40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start
>>
>>
>>
>>
>>
>>> So this assumes that the lower 12 bits of the actual address in memory
>>> and the lower 12 bits of the linker's view match.
>>> An example:
>>> 00024: adrp x0, SYMBOL
>>> 00028: add  x0, x0, :lo12:SYMBOL
>>>
>>> SYMBOL:
>>> 42058: ...
>>>
>>> The toolchain will generate:
>>> 	adrp x0, #0x42; add x0, x0, #0x058
>>>
>>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
>>> 0x80042858.
>>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
>>> The add will just add 0x58, so you end up with x0 being 0x80042058,
>>> which is not the right address.
>>>
>>> Does this make sense?
>>
>>
>> Yes, it makes sense.
>>
>>>
>>>> Perhaps the commit message is a little confusing. The toolchain will
>>>> compute the pc-relative offset from this particular location to the
>>>> symbol and apply the relocations accordingly.
>>>
>>> Yes, but the PC relative offset applies only to the upper 20 bits,
>>> because it's only adrp that has PC relative semantics.
>>>
>>>
>>>>>
>>>>> Shall we at least document this? I guess typical load address are
>>>>> actually quite well aligned, so it might not be an issue in practice.
>>>>>
>>
>> Yes, probably worth documenting and perhaps an early bail-out if it's not
>> the case...
> 
> Documenting sounds good, Kconfig might be a good place, as Michal suggested.
> 
> Bail out: I thought about that, it's very easy to detect at runtime, but
> what then? This is really early, so you could just enter a WFI loop, and
> hope for someone to connect the dots?
> Or can you think of any other way of communicating with the user?

yes it is very early. It is the first real task what run after reset.
I am fine with detecting it to make sure that we won't have
unpredictable behavior later.
What detection code do you have in mind?
Don't we even have this 4k alignment in place already?

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 13:41           ` Michal Simek
@ 2020-09-03 13:52             ` André Przywara
  2020-09-03 13:59               ` Edgar E. Iglesias
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2020-09-03 13:52 UTC (permalink / raw)
  To: u-boot

On 03/09/2020 14:41, Michal Simek wrote:
> 
> 
> On 02. 09. 20 20:59, Andr? Przywara wrote:
>> On 02/09/2020 16:25, Edgar E. Iglesias wrote:
>>> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote:
>>>> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
>>>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
>>>>>> On 02/09/2020 12:15, Michal Simek wrote:
>>>>
>>>> Hi,
>>>>
>>>>>>
>>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>>>
>>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>>>>> compilation error is shown:
>>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>>>>> defined in .bss_start section in u-boot.
>>>>>>>
>>>>>>> It is caused by adr instruction which permits the calculation of any byte
>>>>>>> address within +- 1MB of the current PC.
>>>>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>>>>
>>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>>>>> provides the lower 12 bits which is offset within 4k page.
>>>>>>> These two instructions together compose full 32bit offset which should be
>>>>>>> more then enough to cover the whole u-boot size.
>>>>>>>
>>>>>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>>>>>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>>>>>>
>>>>>> It's a bit scary that you need more than 1MB, but indeed what you do
>>>>>> below is the canonical pattern to get the full range of PC relative
>>>>>> addressing (this is used heavily in Trusted Firmware, for instance).
>>>>>>
>>>>>> The only thing to keep in mind is that this assumes that the load
>>>>>> address of the binary is 4K aligned, so that the low 12 bits of the
>>>>>> symbol stay the same. I wonder if we should enforce this somehow? But
>>>>>> the load address is not controlled by the build process (the whole
>>>>>> purpose of PIE), so that's not doable just in the build system?
>>>>>
>>>>> There shouldn't be any need for 4K alignment. Could you elaborate on
>>>>> why you think there is?
>>>>
>>>> That seems to be slightly tricky, and I tried to get some confirmation,
>>>> but here goes my reasoning. Maybe you can confirm this:
>>>>
>>>> - adrp takes the relative offset, but only of the upper 20 bits (because
>>>> that's all we can encode). It clears the lower 12 bits of the register.
>>>> - the "add" is not PC relative anymore, so it just takes the lower 12
>>>> bits of the "absolute" linker symbol.
>>>
>>> I was under the impression that this would use a PC-relative lower 12bit
>>> relocation but you are correct. I dissasembled the result:
>>>
>>>   40:   91000042        add     x2, x2, #0x0
>>>                         40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start
>>>
>>>
>>>
>>>
>>>
>>>> So this assumes that the lower 12 bits of the actual address in memory
>>>> and the lower 12 bits of the linker's view match.
>>>> An example:
>>>> 00024: adrp x0, SYMBOL
>>>> 00028: add  x0, x0, :lo12:SYMBOL
>>>>
>>>> SYMBOL:
>>>> 42058: ...
>>>>
>>>> The toolchain will generate:
>>>> 	adrp x0, #0x42; add x0, x0, #0x058
>>>>
>>>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
>>>> 0x80042858.
>>>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
>>>> The add will just add 0x58, so you end up with x0 being 0x80042058,
>>>> which is not the right address.
>>>>
>>>> Does this make sense?
>>>
>>>
>>> Yes, it makes sense.
>>>
>>>>
>>>>> Perhaps the commit message is a little confusing. The toolchain will
>>>>> compute the pc-relative offset from this particular location to the
>>>>> symbol and apply the relocations accordingly.
>>>>
>>>> Yes, but the PC relative offset applies only to the upper 20 bits,
>>>> because it's only adrp that has PC relative semantics.
>>>>
>>>>
>>>>>>
>>>>>> Shall we at least document this? I guess typical load address are
>>>>>> actually quite well aligned, so it might not be an issue in practice.
>>>>>>
>>>
>>> Yes, probably worth documenting and perhaps an early bail-out if it's not
>>> the case...
>>
>> Documenting sounds good, Kconfig might be a good place, as Michal suggested.
>>
>> Bail out: I thought about that, it's very easy to detect at runtime, but
>> what then? This is really early, so you could just enter a WFI loop, and
>> hope for someone to connect the dots?
>> Or can you think of any other way of communicating with the user?
> 
> yes it is very early. It is the first real task what run after reset.
> I am fine with detecting it to make sure that we won't have
> unpredictable behavior later.
> What detection code do you have in mind?

Just "adr"ing the beginning of the image (linker address 0), and
checking for all 12 LSBs to be 0. The best I thought of would be a WFI
loop if not. That sounds like 4 instructions or so in total to me.

> Don't we even have this 4k alignment in place already?

Do you mean in linker scripts? I think what counts here is that the
actual *load* address is 4K aligned, which I believe is out of control
of U-Boot. I would guess that's up to the user (flash address) or
previous boot-stages (BootROM or pre-SPL firmware) to set the actual
load address. In the best case it's very platform dependent.
But it is definitely variable, otherwise we wouldn't need PIE in the
first place.

Cheers,
Andre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 13:52             ` André Przywara
@ 2020-09-03 13:59               ` Edgar E. Iglesias
  2020-09-03 14:03                 ` Michal Simek
  0 siblings, 1 reply; 18+ messages in thread
From: Edgar E. Iglesias @ 2020-09-03 13:59 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 03, 2020 at 02:52:39PM +0100, Andr? Przywara wrote:
> On 03/09/2020 14:41, Michal Simek wrote:
> > 
> > 
> > On 02. 09. 20 20:59, Andr? Przywara wrote:
> >> On 02/09/2020 16:25, Edgar E. Iglesias wrote:
> >>> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote:
> >>>> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
> >>>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
> >>>>>> On 02/09/2020 12:15, Michal Simek wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>>>>
> >>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> >>>>>>>
> >>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> >>>>>>> compilation error is shown:
> >>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> >>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> >>>>>>> defined in .bss_start section in u-boot.
> >>>>>>>
> >>>>>>> It is caused by adr instruction which permits the calculation of any byte
> >>>>>>> address within +- 1MB of the current PC.
> >>>>>>> Because U-Boot is bigger then 1MB calculation is failing.
> >>>>>>>
> >>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> >>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
> >>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
> >>>>>>> provides the lower 12 bits which is offset within 4k page.
> >>>>>>> These two instructions together compose full 32bit offset which should be
> >>>>>>> more then enough to cover the whole u-boot size.
> >>>>>>>
> >>>>>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> >>>>>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> >>>>>>
> >>>>>> It's a bit scary that you need more than 1MB, but indeed what you do
> >>>>>> below is the canonical pattern to get the full range of PC relative
> >>>>>> addressing (this is used heavily in Trusted Firmware, for instance).
> >>>>>>
> >>>>>> The only thing to keep in mind is that this assumes that the load
> >>>>>> address of the binary is 4K aligned, so that the low 12 bits of the
> >>>>>> symbol stay the same. I wonder if we should enforce this somehow? But
> >>>>>> the load address is not controlled by the build process (the whole
> >>>>>> purpose of PIE), so that's not doable just in the build system?
> >>>>>
> >>>>> There shouldn't be any need for 4K alignment. Could you elaborate on
> >>>>> why you think there is?
> >>>>
> >>>> That seems to be slightly tricky, and I tried to get some confirmation,
> >>>> but here goes my reasoning. Maybe you can confirm this:
> >>>>
> >>>> - adrp takes the relative offset, but only of the upper 20 bits (because
> >>>> that's all we can encode). It clears the lower 12 bits of the register.
> >>>> - the "add" is not PC relative anymore, so it just takes the lower 12
> >>>> bits of the "absolute" linker symbol.
> >>>
> >>> I was under the impression that this would use a PC-relative lower 12bit
> >>> relocation but you are correct. I dissasembled the result:
> >>>
> >>>   40:   91000042        add     x2, x2, #0x0
> >>>                         40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> So this assumes that the lower 12 bits of the actual address in memory
> >>>> and the lower 12 bits of the linker's view match.
> >>>> An example:
> >>>> 00024: adrp x0, SYMBOL
> >>>> 00028: add  x0, x0, :lo12:SYMBOL
> >>>>
> >>>> SYMBOL:
> >>>> 42058: ...
> >>>>
> >>>> The toolchain will generate:
> >>>> 	adrp x0, #0x42; add x0, x0, #0x058
> >>>>
> >>>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
> >>>> 0x80042858.
> >>>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
> >>>> The add will just add 0x58, so you end up with x0 being 0x80042058,
> >>>> which is not the right address.
> >>>>
> >>>> Does this make sense?
> >>>
> >>>
> >>> Yes, it makes sense.
> >>>
> >>>>
> >>>>> Perhaps the commit message is a little confusing. The toolchain will
> >>>>> compute the pc-relative offset from this particular location to the
> >>>>> symbol and apply the relocations accordingly.
> >>>>
> >>>> Yes, but the PC relative offset applies only to the upper 20 bits,
> >>>> because it's only adrp that has PC relative semantics.
> >>>>
> >>>>
> >>>>>>
> >>>>>> Shall we at least document this? I guess typical load address are
> >>>>>> actually quite well aligned, so it might not be an issue in practice.
> >>>>>>
> >>>
> >>> Yes, probably worth documenting and perhaps an early bail-out if it's not
> >>> the case...
> >>
> >> Documenting sounds good, Kconfig might be a good place, as Michal suggested.
> >>
> >> Bail out: I thought about that, it's very easy to detect at runtime, but
> >> what then? This is really early, so you could just enter a WFI loop, and
> >> hope for someone to connect the dots?
> >> Or can you think of any other way of communicating with the user?
> > 
> > yes it is very early. It is the first real task what run after reset.
> > I am fine with detecting it to make sure that we won't have
> > unpredictable behavior later.
> > What detection code do you have in mind?
> 
> Just "adr"ing the beginning of the image (linker address 0), and
> checking for all 12 LSBs to be 0. The best I thought of would be a WFI
> loop if not. That sounds like 4 instructions or so in total to me.

Yeah, that sounds good me too. With a good comment in the source-code, people would be able to connect the dots.

> 
> > Don't we even have this 4k alignment in place already?
> 
> Do you mean in linker scripts? I think what counts here is that the
> actual *load* address is 4K aligned, which I believe is out of control
> of U-Boot. I would guess that's up to the user (flash address) or
> previous boot-stages (BootROM or pre-SPL firmware) to set the actual
> load address. In the best case it's very platform dependent.
> But it is definitely variable, otherwise we wouldn't need PIE in the
> first place.

Right, it's the run-time load address that matters.
I guess we already have limitations that fail silently (i.e a user can't load U-boot at
address 1) but the 4K one may be more subtle and possible to catch.

Cheers,
Edgar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 13:59               ` Edgar E. Iglesias
@ 2020-09-03 14:03                 ` Michal Simek
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Simek @ 2020-09-03 14:03 UTC (permalink / raw)
  To: u-boot



On 03. 09. 20 15:59, Edgar E. Iglesias wrote:
> On Thu, Sep 03, 2020 at 02:52:39PM +0100, Andr? Przywara wrote:
>> On 03/09/2020 14:41, Michal Simek wrote:
>>>
>>>
>>> On 02. 09. 20 20:59, Andr? Przywara wrote:
>>>> On 02/09/2020 16:25, Edgar E. Iglesias wrote:
>>>>> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote:
>>>>>> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
>>>>>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote:
>>>>>>>> On 02/09/2020 12:15, Michal Simek wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>>
>>>>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>>>>>
>>>>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>>>>>>> compilation error is shown:
>>>>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>>>>>>> defined in .bss_start section in u-boot.
>>>>>>>>>
>>>>>>>>> It is caused by adr instruction which permits the calculation of any byte
>>>>>>>>> address within +- 1MB of the current PC.
>>>>>>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>>>>>>
>>>>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>>>>>>> provides the lower 12 bits which is offset within 4k page.
>>>>>>>>> These two instructions together compose full 32bit offset which should be
>>>>>>>>> more then enough to cover the whole u-boot size.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
>>>>>>>>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>>>>>>>>
>>>>>>>> It's a bit scary that you need more than 1MB, but indeed what you do
>>>>>>>> below is the canonical pattern to get the full range of PC relative
>>>>>>>> addressing (this is used heavily in Trusted Firmware, for instance).
>>>>>>>>
>>>>>>>> The only thing to keep in mind is that this assumes that the load
>>>>>>>> address of the binary is 4K aligned, so that the low 12 bits of the
>>>>>>>> symbol stay the same. I wonder if we should enforce this somehow? But
>>>>>>>> the load address is not controlled by the build process (the whole
>>>>>>>> purpose of PIE), so that's not doable just in the build system?
>>>>>>>
>>>>>>> There shouldn't be any need for 4K alignment. Could you elaborate on
>>>>>>> why you think there is?
>>>>>>
>>>>>> That seems to be slightly tricky, and I tried to get some confirmation,
>>>>>> but here goes my reasoning. Maybe you can confirm this:
>>>>>>
>>>>>> - adrp takes the relative offset, but only of the upper 20 bits (because
>>>>>> that's all we can encode). It clears the lower 12 bits of the register.
>>>>>> - the "add" is not PC relative anymore, so it just takes the lower 12
>>>>>> bits of the "absolute" linker symbol.
>>>>>
>>>>> I was under the impression that this would use a PC-relative lower 12bit
>>>>> relocation but you are correct. I dissasembled the result:
>>>>>
>>>>>   40:   91000042        add     x2, x2, #0x0
>>>>>                         40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> So this assumes that the lower 12 bits of the actual address in memory
>>>>>> and the lower 12 bits of the linker's view match.
>>>>>> An example:
>>>>>> 00024: adrp x0, SYMBOL
>>>>>> 00028: add  x0, x0, :lo12:SYMBOL
>>>>>>
>>>>>> SYMBOL:
>>>>>> 42058: ...
>>>>>>
>>>>>> The toolchain will generate:
>>>>>> 	adrp x0, #0x42; add x0, x0, #0x058
>>>>>>
>>>>>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
>>>>>> 0x80042858.
>>>>>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
>>>>>> The add will just add 0x58, so you end up with x0 being 0x80042058,
>>>>>> which is not the right address.
>>>>>>
>>>>>> Does this make sense?
>>>>>
>>>>>
>>>>> Yes, it makes sense.
>>>>>
>>>>>>
>>>>>>> Perhaps the commit message is a little confusing. The toolchain will
>>>>>>> compute the pc-relative offset from this particular location to the
>>>>>>> symbol and apply the relocations accordingly.
>>>>>>
>>>>>> Yes, but the PC relative offset applies only to the upper 20 bits,
>>>>>> because it's only adrp that has PC relative semantics.
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Shall we at least document this? I guess typical load address are
>>>>>>>> actually quite well aligned, so it might not be an issue in practice.
>>>>>>>>
>>>>>
>>>>> Yes, probably worth documenting and perhaps an early bail-out if it's not
>>>>> the case...
>>>>
>>>> Documenting sounds good, Kconfig might be a good place, as Michal suggested.
>>>>
>>>> Bail out: I thought about that, it's very easy to detect at runtime, but
>>>> what then? This is really early, so you could just enter a WFI loop, and
>>>> hope for someone to connect the dots?
>>>> Or can you think of any other way of communicating with the user?
>>>
>>> yes it is very early. It is the first real task what run after reset.
>>> I am fine with detecting it to make sure that we won't have
>>> unpredictable behavior later.
>>> What detection code do you have in mind?
>>
>> Just "adr"ing the beginning of the image (linker address 0), and
>> checking for all 12 LSBs to be 0. The best I thought of would be a WFI
>> loop if not. That sounds like 4 instructions or so in total to me.
> 
> Yeah, that sounds good me too. With a good comment in the source-code, people would be able to connect the dots.
> 
>>
>>> Don't we even have this 4k alignment in place already?
>>
>> Do you mean in linker scripts? I think what counts here is that the
>> actual *load* address is 4K aligned, which I believe is out of control
>> of U-Boot. I would guess that's up to the user (flash address) or
>> previous boot-stages (BootROM or pre-SPL firmware) to set the actual
>> load address. In the best case it's very platform dependent.
>> But it is definitely variable, otherwise we wouldn't need PIE in the
>> first place.
> 
> Right, it's the run-time load address that matters.
> I guess we already have limitations that fail silently (i.e a user can't load U-boot at
> address 1) but the 4K one may be more subtle and possible to catch.

Edgar: Do you want to send v2 with it? Or do you want me to send v2 with
Kconfig update and checking alignment?
Also as you know you have written origin patch and I have added
description based on our discussion. If you want me to send v2 it is
good time to make that commit message more accurate. :-)

Thanks,
Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 13:40     ` André Przywara
@ 2020-09-03 15:45       ` Stephen Warren
  2020-09-03 19:07         ` Edgar E. Iglesias
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Warren @ 2020-09-03 15:45 UTC (permalink / raw)
  To: u-boot

On 9/3/20 7:40 AM, Andr? Przywara wrote:
> On 03/09/2020 14:35, Michal Simek wrote:
>>
>>
>> On 02. 09. 20 18:34, Stephen Warren wrote:
>>> On 9/2/20 5:15 AM, Michal Simek wrote:
>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>
>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>> compilation error is shown:
>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>> defined in .bss_start section in u-boot.
>>>>
>>>> It is caused by adr instruction which permits the calculation of any byte
>>>> address within +- 1MB of the current PC.
>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>
>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>> provides the lower 12 bits which is offset within 4k page.
>>>> These two instructions together compose full 32bit offset which should be
>>>> more then enough to cover the whole u-boot size.
>>>
>>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
>>>
>>>> @@ -67,8 +67,10 @@ pie_fixup:
>>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
>>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
>>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
>>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
>>>> +	add     x2, x2, #:lo12:__rel_dyn_start
>>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
>>>> +	add     x3, x3, #:lo12:__rel_dyn_end
>>>>  pie_fix_loop:
>>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
>>>
>>> There are likely a bunch of other places in the code that need updating
>>> too; take a look at commit 49e93875a62f "arm64: support running at addr
>>> other than linked to" (which introduced the code above) to find other
>>> places with similar instruction sequences that almost certainly need
>>> updating.
>>>
>>
>> I didn't hit any issue to have a need to update them. Definitely worth
>> to check that locations too.
> 
> So I thought the same, so I checked at least this file. And while there
> are other "adr" instructions, they only go to nearby labels, so don't
> need to be pumped up.
> But I will try to grep for more cases as well.


So in the patch I linked to, what about the added ard instructions in
arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?

Perhaps that code gets linked more towards the middle of U-Boot than the
code you're fixing in start.S, so the max 1M offset just happens to
reach all the relevant symbols and relocations that are in your current
binary, but if your binary gets a little larger (e.g. goes from 1.05M to
2M say) that code will fail in the same way?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 15:45       ` Stephen Warren
@ 2020-09-03 19:07         ` Edgar E. Iglesias
  2020-09-03 19:10           ` Stephen Warren
  0 siblings, 1 reply; 18+ messages in thread
From: Edgar E. Iglesias @ 2020-09-03 19:07 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
> On 9/3/20 7:40 AM, Andr? Przywara wrote:
> > On 03/09/2020 14:35, Michal Simek wrote:
> >>
> >>
> >> On 02. 09. 20 18:34, Stephen Warren wrote:
> >>> On 9/2/20 5:15 AM, Michal Simek wrote:
> >>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> >>>>
> >>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> >>>> compilation error is shown:
> >>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> >>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> >>>> defined in .bss_start section in u-boot.
> >>>>
> >>>> It is caused by adr instruction which permits the calculation of any byte
> >>>> address within +- 1MB of the current PC.
> >>>> Because U-Boot is bigger then 1MB calculation is failing.
> >>>>
> >>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> >>>> immediate left by 12 bits (4k page), adds it to the value of the program
> >>>> counter with the bottom 12 bits cleared to zero. Then add instruction
> >>>> provides the lower 12 bits which is offset within 4k page.
> >>>> These two instructions together compose full 32bit offset which should be
> >>>> more then enough to cover the whole u-boot size.
> >>>
> >>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> >>>
> >>>> @@ -67,8 +67,10 @@ pie_fixup:
> >>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
> >>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
> >>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
> >>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
> >>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
> >>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
> >>>> +	add     x2, x2, #:lo12:__rel_dyn_start
> >>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
> >>>> +	add     x3, x3, #:lo12:__rel_dyn_end
> >>>>  pie_fix_loop:
> >>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
> >>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
> >>>
> >>> There are likely a bunch of other places in the code that need updating
> >>> too; take a look at commit 49e93875a62f "arm64: support running at addr
> >>> other than linked to" (which introduced the code above) to find other
> >>> places with similar instruction sequences that almost certainly need
> >>> updating.
> >>>
> >>
> >> I didn't hit any issue to have a need to update them. Definitely worth
> >> to check that locations too.
> > 
> > So I thought the same, so I checked at least this file. And while there
> > are other "adr" instructions, they only go to nearby labels, so don't
> > need to be pumped up.
> > But I will try to grep for more cases as well.
> 
> 
> So in the patch I linked to, what about the added ard instructions in
> arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
> 
> Perhaps that code gets linked more towards the middle of U-Boot than the
> code you're fixing in start.S, so the max 1M offset just happens to
> reach all the relevant symbols and relocations that are in your current
> binary, but if your binary gets a little larger (e.g. goes from 1.05M to
> 2M say) that code will fail in the same way?

Yes, those were apparently already corrected by Ibai:
https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f255aae#diff-4f864f65dc6b6f2535a5d252b7c9fcc7

Cheers,
Edgar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 19:07         ` Edgar E. Iglesias
@ 2020-09-03 19:10           ` Stephen Warren
  2020-09-03 19:12             ` Edgar E. Iglesias
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Warren @ 2020-09-03 19:10 UTC (permalink / raw)
  To: u-boot

On 9/3/20 1:07 PM, Edgar E. Iglesias wrote:
> On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
>> On 9/3/20 7:40 AM, Andr? Przywara wrote:
>>> On 03/09/2020 14:35, Michal Simek wrote:
>>>>
>>>>
>>>> On 02. 09. 20 18:34, Stephen Warren wrote:
>>>>> On 9/2/20 5:15 AM, Michal Simek wrote:
>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>>
>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
>>>>>> compilation error is shown:
>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
>>>>>> defined in .bss_start section in u-boot.
>>>>>>
>>>>>> It is caused by adr instruction which permits the calculation of any byte
>>>>>> address within +- 1MB of the current PC.
>>>>>> Because U-Boot is bigger then 1MB calculation is failing.
>>>>>>
>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
>>>>>> provides the lower 12 bits which is offset within 4k page.
>>>>>> These two instructions together compose full 32bit offset which should be
>>>>>> more then enough to cover the whole u-boot size.
>>>>>
>>>>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
>>>>>
>>>>>> @@ -67,8 +67,10 @@ pie_fixup:
>>>>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
>>>>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
>>>>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
>>>>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
>>>>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
>>>>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
>>>>>> +	add     x2, x2, #:lo12:__rel_dyn_start
>>>>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
>>>>>> +	add     x3, x3, #:lo12:__rel_dyn_end
>>>>>>  pie_fix_loop:
>>>>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
>>>>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
>>>>>
>>>>> There are likely a bunch of other places in the code that need updating
>>>>> too; take a look at commit 49e93875a62f "arm64: support running at addr
>>>>> other than linked to" (which introduced the code above) to find other
>>>>> places with similar instruction sequences that almost certainly need
>>>>> updating.
>>>>>
>>>>
>>>> I didn't hit any issue to have a need to update them. Definitely worth
>>>> to check that locations too.
>>>
>>> So I thought the same, so I checked at least this file. And while there
>>> are other "adr" instructions, they only go to nearby labels, so don't
>>> need to be pumped up.
>>> But I will try to grep for more cases as well.
>>
>>
>> So in the patch I linked to, what about the added ard instructions in
>> arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
>>
>> Perhaps that code gets linked more towards the middle of U-Boot than the
>> code you're fixing in start.S, so the max 1M offset just happens to
>> reach all the relevant symbols and relocations that are in your current
>> binary, but if your binary gets a little larger (e.g. goes from 1.05M to
>> 2M say) that code will fail in the same way?
> 
> Yes, those were apparently already corrected by Ibai:
> https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f255aae#diff-4f864f65dc6b6f2535a5d252b7c9fcc7

Ah OK.

So I guess this means that in practice, U-Boot already has the
limitation that the load (and relocation target?) address must be
4K-aligned, since it uses the same instruction sequence we're
discusssing here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y
  2020-09-03 19:10           ` Stephen Warren
@ 2020-09-03 19:12             ` Edgar E. Iglesias
  0 siblings, 0 replies; 18+ messages in thread
From: Edgar E. Iglesias @ 2020-09-03 19:12 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 03, 2020 at 01:10:48PM -0600, Stephen Warren wrote:
> On 9/3/20 1:07 PM, Edgar E. Iglesias wrote:
> > On Thu, Sep 03, 2020 at 09:45:39AM -0600, Stephen Warren wrote:
> >> On 9/3/20 7:40 AM, Andr? Przywara wrote:
> >>> On 03/09/2020 14:35, Michal Simek wrote:
> >>>>
> >>>>
> >>>> On 02. 09. 20 18:34, Stephen Warren wrote:
> >>>>> On 9/2/20 5:15 AM, Michal Simek wrote:
> >>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> >>>>>>
> >>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> >>>>>> compilation error is shown:
> >>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> >>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> >>>>>> defined in .bss_start section in u-boot.
> >>>>>>
> >>>>>> It is caused by adr instruction which permits the calculation of any byte
> >>>>>> address within +- 1MB of the current PC.
> >>>>>> Because U-Boot is bigger then 1MB calculation is failing.
> >>>>>>
> >>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> >>>>>> immediate left by 12 bits (4k page), adds it to the value of the program
> >>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction
> >>>>>> provides the lower 12 bits which is offset within 4k page.
> >>>>>> These two instructions together compose full 32bit offset which should be
> >>>>>> more then enough to cover the whole u-boot size.
> >>>>>
> >>>>>> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> >>>>>
> >>>>>> @@ -67,8 +67,10 @@ pie_fixup:
> >>>>>>  	adr	x0, _start		/* x0 <- Runtime value of _start */
> >>>>>>  	ldr	x1, _TEXT_BASE		/* x1 <- Linked value of _start */
> >>>>>>  	sub	x9, x0, x1		/* x9 <- Run-vs-link offset */
> >>>>>> -	adr	x2, __rel_dyn_start	/* x2 <- Runtime &__rel_dyn_start */
> >>>>>> -	adr	x3, __rel_dyn_end	/* x3 <- Runtime &__rel_dyn_end */
> >>>>>> +	adrp    x2, __rel_dyn_start     /* x2 <- Runtime &__rel_dyn_start */
> >>>>>> +	add     x2, x2, #:lo12:__rel_dyn_start
> >>>>>> +	adrp    x3, __rel_dyn_end       /* x3 <- Runtime &__rel_dyn_end */
> >>>>>> +	add     x3, x3, #:lo12:__rel_dyn_end
> >>>>>>  pie_fix_loop:
> >>>>>>  	ldp	x0, x1, [x2], #16	/* (x0, x1) <- (Link location, fixup) */
> >>>>>>  	ldr	x4, [x2], #8		/* x4 <- addend */
> >>>>>
> >>>>> There are likely a bunch of other places in the code that need updating
> >>>>> too; take a look at commit 49e93875a62f "arm64: support running at addr
> >>>>> other than linked to" (which introduced the code above) to find other
> >>>>> places with similar instruction sequences that almost certainly need
> >>>>> updating.
> >>>>>
> >>>>
> >>>> I didn't hit any issue to have a need to update them. Definitely worth
> >>>> to check that locations too.
> >>>
> >>> So I thought the same, so I checked at least this file. And while there
> >>> are other "adr" instructions, they only go to nearby labels, so don't
> >>> need to be pumped up.
> >>> But I will try to grep for more cases as well.
> >>
> >>
> >> So in the patch I linked to, what about the added ard instructions in
> >> arch/arm/lib/crt0_64.S and arch/arm/lib/relocate_64.S?
> >>
> >> Perhaps that code gets linked more towards the middle of U-Boot than the
> >> code you're fixing in start.S, so the max 1M offset just happens to
> >> reach all the relevant symbols and relocations that are in your current
> >> binary, but if your binary gets a little larger (e.g. goes from 1.05M to
> >> 2M say) that code will fail in the same way?
> > 
> > Yes, those were apparently already corrected by Ibai:
> > https://github.com/u-boot/u-boot/commit/98ffbb78e12646a1d06236ad6a1893217f255aae#diff-4f864f65dc6b6f2535a5d252b7c9fcc7
> 
> Ah OK.
> 
> So I guess this means that in practice, U-Boot already has the
> limitation that the load (and relocation target?) address must be
> 4K-aligned, since it uses the same instruction sequence we're
> discusssing here.

Apparently, yes.

Anyway, we'll submit a patch tomorrow with the early trapping if not
aligned.

Cheers,
Edgar

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-03 19:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02 11:15 [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y Michal Simek
2020-09-02 14:43 ` André Przywara
2020-09-02 14:51   ` Michal Simek
2020-09-02 14:53   ` Edgar E. Iglesias
2020-09-02 15:18     ` André Przywara
2020-09-02 15:25       ` Edgar E. Iglesias
2020-09-02 18:59         ` André Przywara
2020-09-03 13:41           ` Michal Simek
2020-09-03 13:52             ` André Przywara
2020-09-03 13:59               ` Edgar E. Iglesias
2020-09-03 14:03                 ` Michal Simek
2020-09-02 16:34 ` Stephen Warren
2020-09-03 13:35   ` Michal Simek
2020-09-03 13:40     ` André Przywara
2020-09-03 15:45       ` Stephen Warren
2020-09-03 19:07         ` Edgar E. Iglesias
2020-09-03 19:10           ` Stephen Warren
2020-09-03 19:12             ` Edgar E. Iglesias

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.