From mboxrd@z Thu Jan 1 00:00:00 1970 From: Edgar E. Iglesias Date: Thu, 3 Sep 2020 15:59:04 +0200 Subject: [PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y In-Reply-To: <5d7ed3b6-2b76-280f-65fb-36d8ebe3d0d1@arm.com> References: <8438394ae435af2b900b965622969dce96701b88.1599045314.git.michal.simek@xilinx.com> <67147ba5-4bcc-2f2d-d979-17d4798198e0@arm.com> <20200902145319.GX14249@toto> <0c4c6194-9a87-99e9-4fed-92b5a705ca4a@arm.com> <20200902152515.GY14249@toto> <58042180-713f-e8c3-fdbc-fe5228c4ba92@arm.com> <7d77a66f-455e-8151-f85c-1265c81d4015@monstr.eu> <5d7ed3b6-2b76-280f-65fb-36d8ebe3d0d1@arm.com> Message-ID: <20200903135904.GA14249@toto> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On Thu, Sep 03, 2020 at 02:52:39PM +0100, Andr? Przywara wrote: > On 03/09/2020 14:41, Michal Simek wrote: > > > > > > On 02. 09. 20 20:59, Andr? Przywara wrote: > >> On 02/09/2020 16:25, Edgar E. Iglesias wrote: > >>> On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr??? Przywara wrote: > >>>> On 02/09/2020 15:53, Edgar E. Iglesias wrote: > >>>>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr??? Przywara wrote: > >>>>>> On 02/09/2020 12:15, Michal Simek wrote: > >>>> > >>>> Hi, > >>>> > >>>>>> > >>>>>>> From: "Edgar E. Iglesias" > >>>>>>> > >>>>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y > >>>>>>> compilation error is shown: > >>>>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation > >>>>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' > >>>>>>> defined in .bss_start section in u-boot. > >>>>>>> > >>>>>>> It is caused by adr instruction which permits the calculation of any byte > >>>>>>> address within +- 1MB of the current PC. > >>>>>>> Because U-Boot is bigger then 1MB calculation is failing. > >>>>>>> > >>>>>>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit > >>>>>>> immediate left by 12 bits (4k page), adds it to the value of the program > >>>>>>> counter with the bottom 12 bits cleared to zero. Then add instruction > >>>>>>> provides the lower 12 bits which is offset within 4k page. > >>>>>>> These two instructions together compose full 32bit offset which should be > >>>>>>> more then enough to cover the whole u-boot size. > >>>>>>> > >>>>>>> Signed-off-by: Edgar E. Iglesias > >>>>>>> Signed-off-by: Michal Simek > >>>>>> > >>>>>> It's a bit scary that you need more than 1MB, but indeed what you do > >>>>>> below is the canonical pattern to get the full range of PC relative > >>>>>> addressing (this is used heavily in Trusted Firmware, for instance). > >>>>>> > >>>>>> The only thing to keep in mind is that this assumes that the load > >>>>>> address of the binary is 4K aligned, so that the low 12 bits of the > >>>>>> symbol stay the same. I wonder if we should enforce this somehow? But > >>>>>> the load address is not controlled by the build process (the whole > >>>>>> purpose of PIE), so that's not doable just in the build system? > >>>>> > >>>>> There shouldn't be any need for 4K alignment. Could you elaborate on > >>>>> why you think there is? > >>>> > >>>> That seems to be slightly tricky, and I tried to get some confirmation, > >>>> but here goes my reasoning. Maybe you can confirm this: > >>>> > >>>> - adrp takes the relative offset, but only of the upper 20 bits (because > >>>> that's all we can encode). It clears the lower 12 bits of the register. > >>>> - the "add" is not PC relative anymore, so it just takes the lower 12 > >>>> bits of the "absolute" linker symbol. > >>> > >>> I was under the impression that this would use a PC-relative lower 12bit > >>> relocation but you are correct. I dissasembled the result: > >>> > >>> 40: 91000042 add x2, x2, #0x0 > >>> 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start > >>> > >>> > >>> > >>> > >>> > >>>> So this assumes that the lower 12 bits of the actual address in memory > >>>> and the lower 12 bits of the linker's view match. > >>>> An example: > >>>> 00024: adrp x0, SYMBOL > >>>> 00028: add x0, x0, :lo12:SYMBOL > >>>> > >>>> SYMBOL: > >>>> 42058: ... > >>>> > >>>> The toolchain will generate: > >>>> adrp x0, #0x42; add x0, x0, #0x058 > >>>> > >>>> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at > >>>> 0x80042858. > >>>> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. > >>>> The add will just add 0x58, so you end up with x0 being 0x80042058, > >>>> which is not the right address. > >>>> > >>>> Does this make sense? > >>> > >>> > >>> Yes, it makes sense. > >>> > >>>> > >>>>> Perhaps the commit message is a little confusing. The toolchain will > >>>>> compute the pc-relative offset from this particular location to the > >>>>> symbol and apply the relocations accordingly. > >>>> > >>>> Yes, but the PC relative offset applies only to the upper 20 bits, > >>>> because it's only adrp that has PC relative semantics. > >>>> > >>>> > >>>>>> > >>>>>> Shall we at least document this? I guess typical load address are > >>>>>> actually quite well aligned, so it might not be an issue in practice. > >>>>>> > >>> > >>> Yes, probably worth documenting and perhaps an early bail-out if it's not > >>> the case... > >> > >> Documenting sounds good, Kconfig might be a good place, as Michal suggested. > >> > >> Bail out: I thought about that, it's very easy to detect at runtime, but > >> what then? This is really early, so you could just enter a WFI loop, and > >> hope for someone to connect the dots? > >> Or can you think of any other way of communicating with the user? > > > > yes it is very early. It is the first real task what run after reset. > > I am fine with detecting it to make sure that we won't have > > unpredictable behavior later. > > What detection code do you have in mind? > > Just "adr"ing the beginning of the image (linker address 0), and > checking for all 12 LSBs to be 0. The best I thought of would be a WFI > loop if not. That sounds like 4 instructions or so in total to me. Yeah, that sounds good me too. With a good comment in the source-code, people would be able to connect the dots. > > > Don't we even have this 4k alignment in place already? > > Do you mean in linker scripts? I think what counts here is that the > actual *load* address is 4K aligned, which I believe is out of control > of U-Boot. I would guess that's up to the user (flash address) or > previous boot-stages (BootROM or pre-SPL firmware) to set the actual > load address. In the best case it's very platform dependent. > But it is definitely variable, otherwise we wouldn't need PIE in the > first place. Right, it's the run-time load address that matters. I guess we already have limitations that fail silently (i.e a user can't load U-boot at address 1) but the 4K one may be more subtle and possible to catch. Cheers, Edgar