All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] U-Boot proper(not SPL) relocate option
@ 2017-11-21  9:33 Kever Yang
  2017-11-21 10:29 ` Lukasz Majewski
  2017-11-26 14:04 ` Andreas Färber
  0 siblings, 2 replies; 26+ messages in thread
From: Kever Yang @ 2017-11-21  9:33 UTC (permalink / raw)
  To: u-boot

Hi Guys,

     I try to understand why we need to do the relocate in U-Boot.
 From the document README/crt0.S, I think the relocation feature comes
from some SoC have limited SRAM whose size is enough to load the whole
U-Boot, but not enough to run all the drivers.

     I don't know how many SoCs/Archs still must use this feature, but 
I'm sure all
Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
because rockchip using SPL always running in SRAM to init DDR SDRAM,
and after DRAM available always running U-Boot in DRAM.

There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?


Here is the document from README:

board_init_f():
         - purpose: set up the machine ready for running board_init_r():
                 i.e. SDRAM and serial UART
         - global_data is available
         - stack is in SRAM
         - BSS is not available, so you cannot use global/static variables,
                 only stack variables and global_data

         Non-SPL-specific notes:
         - dram_init() is called to set up DRAM. If already done in SPL 
this
                 can do nothing

         SPL-specific notes:
         - you can override the entire board_init_f() function with your own
                 version as needed.
         - preloader_console_init() can be called here in extremis
         - should set up SDRAM, and anything needed to make the UART work
         - these is no need to clear BSS, it will be done by crt0.S
         - must return normally from this function (don't call 
board_init_r()
                 directly)

board_init_r():
         - purpose: main execution, common code
         - global_data is available
         - SDRAM is available
         - BSS is available, all static/global variables can be used
         - execution eventually continues to main_loop()

         Non-SPL-specific notes:
         - U-Boot is relocated to the top of memory and is now running from
                 there.

         SPL-specific notes:
         - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is 
defined and
                 CONFIG_SPL_STACK_R_ADDR points into SDRAM
         - preloader_console_init() can be called here - typically this is
                 done by selecting CONFIG_SPL_BOARD_INIT and then 
supplying a
                 spl_board_init() function containing this call
         - loads U-Boot or (in falcon mode) Linux


Thanks,
- Kever

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-21  9:33 [U-Boot] U-Boot proper(not SPL) relocate option Kever Yang
@ 2017-11-21 10:29 ` Lukasz Majewski
  2017-11-22  1:59   ` Kever Yang
  2017-11-26 14:04 ` Andreas Färber
  1 sibling, 1 reply; 26+ messages in thread
From: Lukasz Majewski @ 2017-11-21 10:29 UTC (permalink / raw)
  To: u-boot

Hi Kever,

> Hi Guys,
> 
>      I try to understand why we need to do the relocate in U-Boot.
>  From the document README/crt0.S, I think the relocation feature comes
> from some SoC have limited SRAM whose size is enough to load the whole
> U-Boot, but not enough to run all the drivers.
> 
>      I don't know how many SoCs/Archs still must use this feature,
> but I'm sure all
> Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
> because rockchip using SPL always running in SRAM to init DDR SDRAM,
> and after DRAM available always running U-Boot in DRAM.

I always thought that u-boot needs relocation to place itself in the
"known" area of SDRAM (which ends in its very end).

In this way we can upload u-boot proper via SPL to any SDRAM location
and then (after relocation) it puts itself to "known" location.

(Please check bdinfo command for details).

Having u-boot at known location helps with:

- Using the non fragmented SDRAM to download updates

- Booting u-boot on many different devices (with different amount of
  RAM) -> you always download u-boot in the near of SDRAM beginning and
  then it relocates itself appropriately.


However, I'm not sure if we would need relocation in SPL (which
runs in SRAM). It seems to me that SPL binary is so board specific, that
we shouldn't need such generic feature there.

> 
> There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
> can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?
> 
> 
> Here is the document from README:
> 
> board_init_f():
>          - purpose: set up the machine ready for running
> board_init_r(): i.e. SDRAM and serial UART
>          - global_data is available
>          - stack is in SRAM
>          - BSS is not available, so you cannot use global/static
> variables, only stack variables and global_data
> 
>          Non-SPL-specific notes:
>          - dram_init() is called to set up DRAM. If already done in
> SPL this
>                  can do nothing
> 
>          SPL-specific notes:
>          - you can override the entire board_init_f() function with
> your own version as needed.
>          - preloader_console_init() can be called here in extremis
>          - should set up SDRAM, and anything needed to make the UART
> work
>          - these is no need to clear BSS, it will be done by crt0.S
>          - must return normally from this function (don't call 
> board_init_r()
>                  directly)
> 
> board_init_r():
>          - purpose: main execution, common code
>          - global_data is available
>          - SDRAM is available
>          - BSS is available, all static/global variables can be used
>          - execution eventually continues to main_loop()
> 
>          Non-SPL-specific notes:
>          - U-Boot is relocated to the top of memory and is now
> running from there.
> 
>          SPL-specific notes:
>          - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is 
> defined and
>                  CONFIG_SPL_STACK_R_ADDR points into SDRAM
>          - preloader_console_init() can be called here - typically
> this is done by selecting CONFIG_SPL_BOARD_INIT and then 
> supplying a
>                  spl_board_init() function containing this call
>          - loads U-Boot or (in falcon mode) Linux
> 
> 
> Thanks,
> - Kever
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171121/cbcfaa3c/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-21 10:29 ` Lukasz Majewski
@ 2017-11-22  1:59   ` Kever Yang
  2017-11-22  7:29     ` Chris Packham
                       ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Kever Yang @ 2017-11-22  1:59 UTC (permalink / raw)
  To: u-boot

Hi Lukasz,


     Thanks for your quick comments on this topic.
On 11/21/2017 06:29 PM, Lukasz Majewski wrote:
> Hi Kever,
>
>> Hi Guys,
>>
>>       I try to understand why we need to do the relocate in U-Boot.
>>   From the document README/crt0.S, I think the relocation feature comes
>> from some SoC have limited SRAM whose size is enough to load the whole
>> U-Boot, but not enough to run all the drivers.
>>
>>       I don't know how many SoCs/Archs still must use this feature,
>> but I'm sure all
>> Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
>> because rockchip using SPL always running in SRAM to init DDR SDRAM,
>> and after DRAM available always running U-Boot in DRAM.
> I always thought that u-boot needs relocation to place itself in the
> "known" area of SDRAM (which ends in its very end).

I can understand this feature, we always do dram_init_banks() first,
then we relocate to 'known' area, then will be no risk to access memory.
I believe there must be some historical reason for some kind of device,
the relocate feature is a wonderful idea for it.

In another case, we can also have a choice for not relocate because:
- we still can have similar 'bdinfo' but without relocate, we can init 
dram info
     first, and then init SP, malloc area and so on, and then other 
driver init.
- All solution for Rockchip SoCs at least have 512MByte DRAM,
     which should be enough for U-Boot and could consider to be 'known' 
area,
     many other SoCs should be similar.
- Without relocate we can save many step, some of our customer really
     care much about the boot time duration.
     * no need to relocate everything
     * no need to copy all the code
     * no need init the driver more than once

Thanks,
- Kever
>
> In this way we can upload u-boot proper via SPL to any SDRAM location
> and then (after relocation) it puts itself to "known" location.
>
> (Please check bdinfo command for details).
>
> Having u-boot at known location helps with:
>
> - Using the non fragmented SDRAM to download updates
>
> - Booting u-boot on many different devices (with different amount of
>    RAM) -> you always download u-boot in the near of SDRAM beginning and
>    then it relocates itself appropriately.
>
>
> However, I'm not sure if we would need relocation in SPL (which
> runs in SRAM). It seems to me that SPL binary is so board specific, that
> we shouldn't need such generic feature there.
>
>> There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
>> can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?
>>
>>
>> Here is the document from README:
>>
>> board_init_f():
>>           - purpose: set up the machine ready for running
>> board_init_r(): i.e. SDRAM and serial UART
>>           - global_data is available
>>           - stack is in SRAM
>>           - BSS is not available, so you cannot use global/static
>> variables, only stack variables and global_data
>>
>>           Non-SPL-specific notes:
>>           - dram_init() is called to set up DRAM. If already done in
>> SPL this
>>                   can do nothing
>>
>>           SPL-specific notes:
>>           - you can override the entire board_init_f() function with
>> your own version as needed.
>>           - preloader_console_init() can be called here in extremis
>>           - should set up SDRAM, and anything needed to make the UART
>> work
>>           - these is no need to clear BSS, it will be done by crt0.S
>>           - must return normally from this function (don't call
>> board_init_r()
>>                   directly)
>>
>> board_init_r():
>>           - purpose: main execution, common code
>>           - global_data is available
>>           - SDRAM is available
>>           - BSS is available, all static/global variables can be used
>>           - execution eventually continues to main_loop()
>>
>>           Non-SPL-specific notes:
>>           - U-Boot is relocated to the top of memory and is now
>> running from there.
>>
>>           SPL-specific notes:
>>           - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is
>> defined and
>>                   CONFIG_SPL_STACK_R_ADDR points into SDRAM
>>           - preloader_console_init() can be called here - typically
>> this is done by selecting CONFIG_SPL_BOARD_INIT and then
>> supplying a
>>                   spl_board_init() function containing this call
>>           - loads U-Boot or (in falcon mode) Linux
>>
>>
>> Thanks,
>> - Kever
>> _______________________________________________
>> U-Boot mailing list
>> U-Boot at lists.denx.de
>> https://lists.denx.de/listinfo/u-boot
>
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22  1:59   ` Kever Yang
@ 2017-11-22  7:29     ` Chris Packham
  2017-11-22  8:47       ` Lukasz Majewski
  2017-11-22  8:45     ` Lokesh Vutla
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Chris Packham @ 2017-11-22  7:29 UTC (permalink / raw)
  To: u-boot

On Wed, Nov 22, 2017 at 2:59 PM, Kever Yang <kever.yang@rock-chips.com> wrote:
> Hi Lukasz,
>
>
>     Thanks for your quick comments on this topic.
> On 11/21/2017 06:29 PM, Lukasz Majewski wrote:
>>
>> Hi Kever,
>>
>>> Hi Guys,
>>>
>>>       I try to understand why we need to do the relocate in U-Boot.
>>>   From the document README/crt0.S, I think the relocation feature comes
>>> from some SoC have limited SRAM whose size is enough to load the whole
>>> U-Boot, but not enough to run all the drivers.
>>>
>>>       I don't know how many SoCs/Archs still must use this feature,
>>> but I'm sure all
>>> Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
>>> because rockchip using SPL always running in SRAM to init DDR SDRAM,
>>> and after DRAM available always running U-Boot in DRAM.
>>
>> I always thought that u-boot needs relocation to place itself in the
>> "known" area of SDRAM (which ends in its very end).
>
>
> I can understand this feature, we always do dram_init_banks() first,
> then we relocate to 'known' area, then will be no risk to access memory.
> I believe there must be some historical reason for some kind of device,
> the relocate feature is a wonderful idea for it.

(I can't really speak for u-boot but in general I think this applies).

In the old days there was no SPL. It was just the same bootloader
image. This image was written (or "burned") to a memory mapped
ROM/flash which could be executed directly in place. Then after the
RAM was initialised the image could be relocated and execution could
continue from the new address.

These days with SoCs that can boot from non-memory-mapped devices the
same tricks can't work which is where the SPL comes in.

The other thing with relocation is that u-boot likes to be at the very
top of RAM. This means we have all this nice contiguous space at the
bottom for the kernel/initrd/whatever .

We can't know at compile time where the top is as some boards may have
DIMMs an others may just have board variants with more or less memory
fitted. Which is why we need to set CONFIG_TEXTBASE to something that
is suitable for the lowest common denominator and relocate once we
know how much RAM we have.

> In another case, we can also have a choice for not relocate because:
> - we still can have similar 'bdinfo' but without relocate, we can init dram
> info
>     first, and then init SP, malloc area and so on, and then other driver
> init.
> - All solution for Rockchip SoCs at least have 512MByte DRAM,
>     which should be enough for U-Boot and could consider to be 'known' area,
>     many other SoCs should be similar.
> - Without relocate we can save many step, some of our customer really
>     care much about the boot time duration.
>     * no need to relocate everything
>     * no need to copy all the code
>     * no need init the driver more than once
>
> Thanks,
> - Kever
>
>>
>> In this way we can upload u-boot proper via SPL to any SDRAM location
>> and then (after relocation) it puts itself to "known" location.
>>
>> (Please check bdinfo command for details).
>>
>> Having u-boot at known location helps with:
>>
>> - Using the non fragmented SDRAM to download updates
>>
>> - Booting u-boot on many different devices (with different amount of
>>    RAM) -> you always download u-boot in the near of SDRAM beginning and
>>    then it relocates itself appropriately.
>>
>>
>> However, I'm not sure if we would need relocation in SPL (which
>> runs in SRAM). It seems to me that SPL binary is so board specific, that
>> we shouldn't need such generic feature there.
>>
>>> There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
>>> can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?
>>>
>>>
>>> Here is the document from README:
>>>
>>> board_init_f():
>>>           - purpose: set up the machine ready for running
>>> board_init_r(): i.e. SDRAM and serial UART
>>>           - global_data is available
>>>           - stack is in SRAM
>>>           - BSS is not available, so you cannot use global/static
>>> variables, only stack variables and global_data
>>>
>>>           Non-SPL-specific notes:
>>>           - dram_init() is called to set up DRAM. If already done in
>>> SPL this
>>>                   can do nothing
>>>
>>>           SPL-specific notes:
>>>           - you can override the entire board_init_f() function with
>>> your own version as needed.
>>>           - preloader_console_init() can be called here in extremis
>>>           - should set up SDRAM, and anything needed to make the UART
>>> work
>>>           - these is no need to clear BSS, it will be done by crt0.S
>>>           - must return normally from this function (don't call
>>> board_init_r()
>>>                   directly)
>>>
>>> board_init_r():
>>>           - purpose: main execution, common code
>>>           - global_data is available
>>>           - SDRAM is available
>>>           - BSS is available, all static/global variables can be used
>>>           - execution eventually continues to main_loop()
>>>
>>>           Non-SPL-specific notes:
>>>           - U-Boot is relocated to the top of memory and is now
>>> running from there.
>>>
>>>           SPL-specific notes:
>>>           - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is
>>> defined and
>>>                   CONFIG_SPL_STACK_R_ADDR points into SDRAM
>>>           - preloader_console_init() can be called here - typically
>>> this is done by selecting CONFIG_SPL_BOARD_INIT and then
>>> supplying a
>>>                   spl_board_init() function containing this call
>>>           - loads U-Boot or (in falcon mode) Linux
>>>
>>>
>>> Thanks,
>>> - Kever
>>> _______________________________________________
>>> U-Boot mailing list
>>> U-Boot at lists.denx.de
>>> https://lists.denx.de/listinfo/u-boot
>>
>>
>>
>> Best regards,
>>
>> Lukasz Majewski
>>
>> --
>>
>> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
>
>
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22  1:59   ` Kever Yang
  2017-11-22  7:29     ` Chris Packham
@ 2017-11-22  8:45     ` Lokesh Vutla
  2017-11-22  8:51     ` Lukasz Majewski
  2017-11-22 10:27     ` Wolfgang Denk
  3 siblings, 0 replies; 26+ messages in thread
From: Lokesh Vutla @ 2017-11-22  8:45 UTC (permalink / raw)
  To: u-boot

+ Simon,

On Wednesday 22 November 2017 07:29 AM, Kever Yang wrote:
> Hi Lukasz,
> 
> 
>     Thanks for your quick comments on this topic.
> On 11/21/2017 06:29 PM, Lukasz Majewski wrote:
>> Hi Kever,
>>
>>> Hi Guys,
>>>
>>>       I try to understand why we need to do the relocate in U-Boot.
>>>   From the document README/crt0.S, I think the relocation feature comes
>>> from some SoC have limited SRAM whose size is enough to load the whole
>>> U-Boot, but not enough to run all the drivers.
>>>
>>>       I don't know how many SoCs/Archs still must use this feature,
>>> but I'm sure all
>>> Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
>>> because rockchip using SPL always running in SRAM to init DDR SDRAM,
>>> and after DRAM available always running U-Boot in DRAM.
>> I always thought that u-boot needs relocation to place itself in the
>> "known" area of SDRAM (which ends in its very end).
> 
> I can understand this feature, we always do dram_init_banks() first,
> then we relocate to 'known' area, then will be no risk to access memory.
> I believe there must be some historical reason for some kind of device,
> the relocate feature is a wonderful idea for it.
> 
> In another case, we can also have a choice for not relocate because:
> - we still can have similar 'bdinfo' but without relocate, we can init
> dram info
>     first, and then init SP, malloc area and so on, and then other
> driver init.
> - All solution for Rockchip SoCs at least have 512MByte DRAM,
>     which should be enough for U-Boot and could consider to be 'known'
> area,
>     many other SoCs should be similar.
> - Without relocate we can save many step, some of our customer really
>     care much about the boot time duration.
>     * no need to relocate everything
>     * no need to copy all the code
>     * no need init the driver more than once

I agree that there should be an option for avoiding relocation. There is
a flag "GD_FLG_SKIP_RELOC" which when enabled on gd->flags tries to skip
relocation for u-boot proper.  I am sure that this must be working for
x86 but it can be ported for other architectures as well?

Thanks and regards,
Lokesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22  7:29     ` Chris Packham
@ 2017-11-22  8:47       ` Lukasz Majewski
  0 siblings, 0 replies; 26+ messages in thread
From: Lukasz Majewski @ 2017-11-22  8:47 UTC (permalink / raw)
  To: u-boot

Hi Chris,

> On Wed, Nov 22, 2017 at 2:59 PM, Kever Yang
> <kever.yang@rock-chips.com> wrote:
> > Hi Lukasz,
> >
> >
> >     Thanks for your quick comments on this topic.
> > On 11/21/2017 06:29 PM, Lukasz Majewski wrote:  
> >>
> >> Hi Kever,
> >>  
> >>> Hi Guys,
> >>>
> >>>       I try to understand why we need to do the relocate in
> >>> U-Boot. From the document README/crt0.S, I think the relocation
> >>> feature comes from some SoC have limited SRAM whose size is
> >>> enough to load the whole U-Boot, but not enough to run all the
> >>> drivers.
> >>>
> >>>       I don't know how many SoCs/Archs still must use this
> >>> feature, but I'm sure all
> >>> Rockchip SoCs do not need this feature in both SPL and proper
> >>> U-Boot, because rockchip using SPL always running in SRAM to init
> >>> DDR SDRAM, and after DRAM available always running U-Boot in
> >>> DRAM.  
> >>
> >> I always thought that u-boot needs relocation to place itself in
> >> the "known" area of SDRAM (which ends in its very end).  
> >
> >
> > I can understand this feature, we always do dram_init_banks() first,
> > then we relocate to 'known' area, then will be no risk to access
> > memory. I believe there must be some historical reason for some
> > kind of device, the relocate feature is a wonderful idea for it.  
> 
> (I can't really speak for u-boot but in general I think this applies).
> 
> In the old days there was no SPL. 

As fair as I remember there was CONFIG_PRELOAD something before SPL
(u-boot delivered two binaries).

> It was just the same bootloader
> image. This image was written (or "burned") to a memory mapped
> ROM/flash which could be executed directly in place. Then after the
> RAM was initialised the image could be relocated and execution could
> continue from the new address.
> 
> These days with SoCs that can boot from non-memory-mapped devices the
> same tricks can't work which is where the SPL comes in.
> 
> The other thing with relocation is that u-boot likes to be at the very
> top of RAM. This means we have all this nice contiguous space at the
> bottom for the kernel/initrd/whatever .
> 
> We can't know at compile time where the top is as some boards may have
> DIMMs an others may just have board variants with more or less memory
> fitted. Which is why we need to set CONFIG_TEXTBASE to something that
> is suitable for the lowest common denominator and relocate once we
> know how much RAM we have.

As I mentioned before - the continous space from RAM start till end -
u-boot size is crucial for updating - i.e when rootfs needs to be
flashed.

But, I do agree with above arguments.

> 
> > In another case, we can also have a choice for not relocate because:
> > - we still can have similar 'bdinfo' but without relocate, we can
> > init dram info
> >     first, and then init SP, malloc area and so on, and then other
> > driver init.
> > - All solution for Rockchip SoCs at least have 512MByte DRAM,
> >     which should be enough for U-Boot and could consider to be
> > 'known' area, many other SoCs should be similar.
> > - Without relocate we can save many step, some of our customer
> > really care much about the boot time duration.
> >     * no need to relocate everything
> >     * no need to copy all the code
> >     * no need init the driver more than once
> >
> > Thanks,
> > - Kever
> >  
> >>
> >> In this way we can upload u-boot proper via SPL to any SDRAM
> >> location and then (after relocation) it puts itself to "known"
> >> location.
> >>
> >> (Please check bdinfo command for details).
> >>
> >> Having u-boot at known location helps with:
> >>
> >> - Using the non fragmented SDRAM to download updates
> >>
> >> - Booting u-boot on many different devices (with different amount
> >> of RAM) -> you always download u-boot in the near of SDRAM
> >> beginning and then it relocates itself appropriately.
> >>
> >>
> >> However, I'm not sure if we would need relocation in SPL (which
> >> runs in SRAM). It seems to me that SPL binary is so board
> >> specific, that we shouldn't need such generic feature there.
> >>  
> >>> There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
> >>> can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?
> >>>
> >>>
> >>> Here is the document from README:
> >>>
> >>> board_init_f():
> >>>           - purpose: set up the machine ready for running
> >>> board_init_r(): i.e. SDRAM and serial UART
> >>>           - global_data is available
> >>>           - stack is in SRAM
> >>>           - BSS is not available, so you cannot use global/static
> >>> variables, only stack variables and global_data
> >>>
> >>>           Non-SPL-specific notes:
> >>>           - dram_init() is called to set up DRAM. If already done
> >>> in SPL this
> >>>                   can do nothing
> >>>
> >>>           SPL-specific notes:
> >>>           - you can override the entire board_init_f() function
> >>> with your own version as needed.
> >>>           - preloader_console_init() can be called here in
> >>> extremis
> >>>           - should set up SDRAM, and anything needed to make the
> >>> UART work
> >>>           - these is no need to clear BSS, it will be done by
> >>> crt0.S
> >>>           - must return normally from this function (don't call
> >>> board_init_r()
> >>>                   directly)
> >>>
> >>> board_init_r():
> >>>           - purpose: main execution, common code
> >>>           - global_data is available
> >>>           - SDRAM is available
> >>>           - BSS is available, all static/global variables can be
> >>> used
> >>>           - execution eventually continues to main_loop()
> >>>
> >>>           Non-SPL-specific notes:
> >>>           - U-Boot is relocated to the top of memory and is now
> >>> running from there.
> >>>
> >>>           SPL-specific notes:
> >>>           - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is
> >>> defined and
> >>>                   CONFIG_SPL_STACK_R_ADDR points into SDRAM
> >>>           - preloader_console_init() can be called here -
> >>> typically this is done by selecting CONFIG_SPL_BOARD_INIT and then
> >>> supplying a
> >>>                   spl_board_init() function containing this call
> >>>           - loads U-Boot or (in falcon mode) Linux
> >>>
> >>>
> >>> Thanks,
> >>> - Kever
> >>> _______________________________________________
> >>> U-Boot mailing list
> >>> U-Boot at lists.denx.de
> >>> https://lists.denx.de/listinfo/u-boot  
> >>
> >>
> >>
> >> Best regards,
> >>
> >> Lukasz Majewski
> >>
> >> --
> >>
> >> DENX Software Engineering GmbH,      Managing Director: Wolfgang
> >> Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> >> Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email:
> >> wd at denx.de  
> >
> >
> >
> > _______________________________________________
> > U-Boot mailing list
> > U-Boot at lists.denx.de
> > https://lists.denx.de/listinfo/u-boot  



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171122/440b49e9/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22  1:59   ` Kever Yang
  2017-11-22  7:29     ` Chris Packham
  2017-11-22  8:45     ` Lokesh Vutla
@ 2017-11-22  8:51     ` Lukasz Majewski
  2017-11-22 10:27     ` Wolfgang Denk
  3 siblings, 0 replies; 26+ messages in thread
From: Lukasz Majewski @ 2017-11-22  8:51 UTC (permalink / raw)
  To: u-boot

Hi Kever,

> Hi Lukasz,
> 
> 
>      Thanks for your quick comments on this topic.
> On 11/21/2017 06:29 PM, Lukasz Majewski wrote:
> > Hi Kever,
> >  
> >> Hi Guys,
> >>
> >>       I try to understand why we need to do the relocate in U-Boot.
> >>   From the document README/crt0.S, I think the relocation feature
> >> comes from some SoC have limited SRAM whose size is enough to load
> >> the whole U-Boot, but not enough to run all the drivers.
> >>
> >>       I don't know how many SoCs/Archs still must use this feature,
> >> but I'm sure all
> >> Rockchip SoCs do not need this feature in both SPL and proper
> >> U-Boot, because rockchip using SPL always running in SRAM to init
> >> DDR SDRAM, and after DRAM available always running U-Boot in
> >> DRAM.  
> > I always thought that u-boot needs relocation to place itself in the
> > "known" area of SDRAM (which ends in its very end).  
> 
> I can understand this feature, we always do dram_init_banks() first,
> then we relocate to 'known' area, then will be no risk to access
> memory. I believe there must be some historical reason for some kind
> of device, the relocate feature is a wonderful idea for it.
> 
> In another case, we can also have a choice for not relocate because:
> - we still can have similar 'bdinfo' but without relocate, we can
> init dram info
>      first, and then init SP, malloc area and so on, and then other 
> driver init.
> - All solution for Rockchip SoCs at least have 512MByte DRAM,

As I've written in the other mail - in some scenarios we don't want to
have fragmented memory (e.g. rootfs flashing).

Would this "fragmented" 512 MiB enough to flash all your binaries?

>      which should be enough for U-Boot and could consider to be
> 'known' area,
>      many other SoCs should be similar.
> - Without relocate we can save many step, some of our customer really
>      care much about the boot time duration.
>      * no need to relocate everything
>      * no need to copy all the code
>      * no need init the driver more than once

I do find your arguments perfectly valid (as in the end customer
decides what features are in u-boot).

Please prepare patches and send them for review.

> 
> Thanks,
> - Kever
> >
> > In this way we can upload u-boot proper via SPL to any SDRAM
> > location and then (after relocation) it puts itself to "known"
> > location.
> >
> > (Please check bdinfo command for details).
> >
> > Having u-boot at known location helps with:
> >
> > - Using the non fragmented SDRAM to download updates
> >
> > - Booting u-boot on many different devices (with different amount of
> >    RAM) -> you always download u-boot in the near of SDRAM
> > beginning and then it relocates itself appropriately.
> >
> >
> > However, I'm not sure if we would need relocation in SPL (which
> > runs in SRAM). It seems to me that SPL binary is so board specific,
> > that we shouldn't need such generic feature there.
> >  
> >> There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
> >> can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?
> >>
> >>
> >> Here is the document from README:
> >>
> >> board_init_f():
> >>           - purpose: set up the machine ready for running
> >> board_init_r(): i.e. SDRAM and serial UART
> >>           - global_data is available
> >>           - stack is in SRAM
> >>           - BSS is not available, so you cannot use global/static
> >> variables, only stack variables and global_data
> >>
> >>           Non-SPL-specific notes:
> >>           - dram_init() is called to set up DRAM. If already done
> >> in SPL this
> >>                   can do nothing
> >>
> >>           SPL-specific notes:
> >>           - you can override the entire board_init_f() function
> >> with your own version as needed.
> >>           - preloader_console_init() can be called here in extremis
> >>           - should set up SDRAM, and anything needed to make the
> >> UART work
> >>           - these is no need to clear BSS, it will be done by
> >> crt0.S
> >>           - must return normally from this function (don't call
> >> board_init_r()
> >>                   directly)
> >>
> >> board_init_r():
> >>           - purpose: main execution, common code
> >>           - global_data is available
> >>           - SDRAM is available
> >>           - BSS is available, all static/global variables can be
> >> used
> >>           - execution eventually continues to main_loop()
> >>
> >>           Non-SPL-specific notes:
> >>           - U-Boot is relocated to the top of memory and is now
> >> running from there.
> >>
> >>           SPL-specific notes:
> >>           - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is
> >> defined and
> >>                   CONFIG_SPL_STACK_R_ADDR points into SDRAM
> >>           - preloader_console_init() can be called here - typically
> >> this is done by selecting CONFIG_SPL_BOARD_INIT and then
> >> supplying a
> >>                   spl_board_init() function containing this call
> >>           - loads U-Boot or (in falcon mode) Linux
> >>
> >>
> >> Thanks,
> >> - Kever
> >> _______________________________________________
> >> U-Boot mailing list
> >> U-Boot at lists.denx.de
> >> https://lists.denx.de/listinfo/u-boot  
> >
> >
> > Best regards,
> >
> > Lukasz Majewski
> >
> > --
> >
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email:
> > wd at denx.de  
> 
> 



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171122/9f81b57c/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22  1:59   ` Kever Yang
                       ` (2 preceding siblings ...)
  2017-11-22  8:51     ` Lukasz Majewski
@ 2017-11-22 10:27     ` Wolfgang Denk
  2017-11-25 22:34       ` Simon Glass
  3 siblings, 1 reply; 26+ messages in thread
From: Wolfgang Denk @ 2017-11-22 10:27 UTC (permalink / raw)
  To: u-boot

Dear Kever Yang,

In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
> 
> I can understand this feature, we always do dram_init_banks() first,
> then we relocate to 'known' area, then will be no risk to access memory.
> I believe there must be some historical reason for some kind of device,
> the relocate feature is a wonderful idea for it.

This is actuallyu not so much a feature needed to support some
specific device (in this case much simpler approahces would be
possible), but to support a whole set of features.  Unfortunately
these appear to get forgotten / ignored over time.

>      many other SoCs should be similar.
> - Without relocate we can save many step, some of our customer really
>      care much about the boot time duration.
>      * no need to relocate everything
>      * no need to copy all the code
>      * no need init the driver more than once

Please have a look at the README, section "Memory Management".
The reloaction is not done to any _fixed_ address, but the address
is actually computed at runtime, depending on a number features
enabled (at least this is how it used to be - appearently little of
this is tested on a regular base, so I would not be surprised if
things are broken today).

The basic idea was to reserve areas of memory at the top of RAM,
that would not be initialized / modified by U-Boot and Linux, not
even across a reset / warm boot.

This was used for exaple for:

- pRAM (Protected RAM) which could be used to store all kind of data
  (for example, using a pramfs [Protected and Persistent RAM
  Filesystem]) that could be kept across reboots of the OS.

- shared frame buffer / video memory. U-Boot and Linux would be able
  to initialize the video memory just once (in U-Boot) and then
  share it, maybe even across reboots.  especially, this would allow
  for a very early splash screen that gets passed (flicker free) to
  Linux until some Linux GUI takes over (much more difficult today).

- shared log buffer: U-Boot and Linux used to use the same syslog
  buffer mechanism, so you could share it between U-Boot and Linux.
  this allows for example to 
  * read the Linux kernel panic messages after reset in U-Boot; this
    is very useful when you bring up a new system and Linux crashes
    before it can display the log buffer on the console
  * pass U-Boot POST results on to Linux, so the application code
    can read and process these
  * process the system log of the previous run (especially after a
    panic) in Lunux after it rebootet.

etc.

There are a number of such features which require to reserve room at
the top of RAM, the size of which is calculatedat runtime, often
depending on user settable environment data.

All this cannot be done without relocation to a (dynmaically
computed) target address.


Yes, the code could be simpler and faster without that - but then,
you cut off a number of features.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The flow chart is a most thoroughly oversold piece of  program  docu-
mentation.              -- Frederick Brooks, "The Mythical Man Month"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-22 10:27     ` Wolfgang Denk
@ 2017-11-25 22:34       ` Simon Glass
  2017-11-25 23:31         ` Dr. Philipp Tomsich
  0 siblings, 1 reply; 26+ messages in thread
From: Simon Glass @ 2017-11-25 22:34 UTC (permalink / raw)
  To: u-boot

+Tom, Masahiro, Philipp

Hi,

On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
> Dear Kever Yang,
>
> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>
>> I can understand this feature, we always do dram_init_banks() first,
>> then we relocate to 'known' area, then will be no risk to access memory.
>> I believe there must be some historical reason for some kind of device,
>> the relocate feature is a wonderful idea for it.
>
> This is actuallyu not so much a feature needed to support some
> specific device (in this case much simpler approahces would be
> possible), but to support a whole set of features.  Unfortunately
> these appear to get forgotten / ignored over time.
>
>>      many other SoCs should be similar.
>> - Without relocate we can save many step, some of our customer really
>>      care much about the boot time duration.
>>      * no need to relocate everything
>>      * no need to copy all the code
>>      * no need init the driver more than once
>
> Please have a look at the README, section "Memory Management".
> The reloaction is not done to any _fixed_ address, but the address
> is actually computed at runtime, depending on a number features
> enabled (at least this is how it used to be - appearently little of
> this is tested on a regular base, so I would not be surprised if
> things are broken today).
>
> The basic idea was to reserve areas of memory at the top of RAM,
> that would not be initialized / modified by U-Boot and Linux, not
> even across a reset / warm boot.
>
> This was used for exaple for:
>
> - pRAM (Protected RAM) which could be used to store all kind of data
>   (for example, using a pramfs [Protected and Persistent RAM
>   Filesystem]) that could be kept across reboots of the OS.
>
> - shared frame buffer / video memory. U-Boot and Linux would be able
>   to initialize the video memory just once (in U-Boot) and then
>   share it, maybe even across reboots.  especially, this would allow
>   for a very early splash screen that gets passed (flicker free) to
>   Linux until some Linux GUI takes over (much more difficult today).
>
> - shared log buffer: U-Boot and Linux used to use the same syslog
>   buffer mechanism, so you could share it between U-Boot and Linux.
>   this allows for example to
>   * read the Linux kernel panic messages after reset in U-Boot; this
>     is very useful when you bring up a new system and Linux crashes
>     before it can display the log buffer on the console
>   * pass U-Boot POST results on to Linux, so the application code
>     can read and process these
>   * process the system log of the previous run (especially after a
>     panic) in Lunux after it rebootet.
>
> etc.
>
> There are a number of such features which require to reserve room at
> the top of RAM, the size of which is calculatedat runtime, often
> depending on user settable environment data.
>
> All this cannot be done without relocation to a (dynmaically
> computed) target address.
>
>
> Yes, the code could be simpler and faster without that - but then,
> you cut off a number of features.

I would be interested in seeing benchmarks showing the cost of
relocation in terms of boot time. Last time I did this was on Exynos 5
and it was some years ago. The time was pretty small provided the
cache was on for the memory copies associated with relocation itself.
Something like 10-20ms but I don't have the numbers handy.

I think it is useful to be able to allocate memory in board_init_f()
for use by U-Boot for things like the display and the malloc() region.

Options we might consider:

1. Don't relocate the code and data. Thus we could avoid the copy and
relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
used when U-Boot runs as an EFI app

2. Rather than throwing away the old malloc() region, keep it around
so existing allocated blocks work. Then new malloc() region would be
used for future allocations. We could perhaps ignore free() calls in
that region

2a. This would allow us to avoid re-init of driver model in most cases
I think. E.g. we could init serial and timer before relocation and
leave them inited after relocation. We could just init the
'additional' devices not done before relocation.

2b. I suppose we could even extend this to SPL if we wanted to. I
suspect it would just be a pain though, since SPL might use memory
that U-Boot wants.

3. We could turn on the cache earlier. This removes most of the
boot-time penalty. Ideally this should be turned on in SPL and perhaps
redone in U-Boot which has more memory available. If SPL is not used,
we could turn on the cache before relocation.

4. Rather than the reserving memory in board_init_f() we could have it
call malloc() from the expanded region. We could then perhaps then
move this reserve/allocate code in to particular drivers or
subsystems, and drop a good chunk of the init sequence. We would need
to have a larger malloc() region than is currently the case.

There are still some arch-specific bits in board_init_f() which make
these sorts of changes a bit tricky to support generically. IMO it
would be best to move to 'generic relocation' written in C, where all
archs work basically the same way, before attempting any of the above.

Still, I can see some benefits and even some simplifications.

Regards,
Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-25 22:34       ` Simon Glass
@ 2017-11-25 23:31         ` Dr. Philipp Tomsich
  2017-11-26 11:38           ` Simon Glass
  0 siblings, 1 reply; 26+ messages in thread
From: Dr. Philipp Tomsich @ 2017-11-25 23:31 UTC (permalink / raw)
  To: u-boot

Hi,

> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
> 
> +Tom, Masahiro, Philipp
> 
> Hi,
> 
> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>> Dear Kever Yang,
>> 
>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>> 
>>> I can understand this feature, we always do dram_init_banks() first,
>>> then we relocate to 'known' area, then will be no risk to access memory.
>>> I believe there must be some historical reason for some kind of device,
>>> the relocate feature is a wonderful idea for it.
>> 
>> This is actuallyu not so much a feature needed to support some
>> specific device (in this case much simpler approahces would be
>> possible), but to support a whole set of features.  Unfortunately
>> these appear to get forgotten / ignored over time.
>> 
>>>     many other SoCs should be similar.
>>> - Without relocate we can save many step, some of our customer really
>>>     care much about the boot time duration.
>>>     * no need to relocate everything
>>>     * no need to copy all the code
>>>     * no need init the driver more than once
>> 
>> Please have a look at the README, section "Memory Management".
>> The reloaction is not done to any _fixed_ address, but the address
>> is actually computed at runtime, depending on a number features
>> enabled (at least this is how it used to be - appearently little of
>> this is tested on a regular base, so I would not be surprised if
>> things are broken today).
>> 
>> The basic idea was to reserve areas of memory at the top of RAM,
>> that would not be initialized / modified by U-Boot and Linux, not
>> even across a reset / warm boot.
>> 
>> This was used for exaple for:
>> 
>> - pRAM (Protected RAM) which could be used to store all kind of data
>>  (for example, using a pramfs [Protected and Persistent RAM
>>  Filesystem]) that could be kept across reboots of the OS.
>> 
>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>  to initialize the video memory just once (in U-Boot) and then
>>  share it, maybe even across reboots.  especially, this would allow
>>  for a very early splash screen that gets passed (flicker free) to
>>  Linux until some Linux GUI takes over (much more difficult today).
>> 
>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>  this allows for example to
>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>    is very useful when you bring up a new system and Linux crashes
>>    before it can display the log buffer on the console
>>  * pass U-Boot POST results on to Linux, so the application code
>>    can read and process these
>>  * process the system log of the previous run (especially after a
>>    panic) in Lunux after it rebootet.
>> 
>> etc.
>> 
>> There are a number of such features which require to reserve room at
>> the top of RAM, the size of which is calculatedat runtime, often
>> depending on user settable environment data.
>> 
>> All this cannot be done without relocation to a (dynmaically
>> computed) target address.
>> 
>> 
>> Yes, the code could be simpler and faster without that - but then,
>> you cut off a number of features.
> 
> I would be interested in seeing benchmarks showing the cost of
> relocation in terms of boot time. Last time I did this was on Exynos 5
> and it was some years ago. The time was pretty small provided the
> cache was on for the memory copies associated with relocation itself.
> Something like 10-20ms but I don't have the numbers handy.
> 
> I think it is useful to be able to allocate memory in board_init_f()
> for use by U-Boot for things like the display and the malloc() region.
> 
> Options we might consider:
> 
> 1. Don't relocate the code and data. Thus we could avoid the copy and
> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
> used when U-Boot runs as an EFI app
> 
> 2. Rather than throwing away the old malloc() region, keep it around
> so existing allocated blocks work. Then new malloc() region would be
> used for future allocations. We could perhaps ignore free() calls in
> that region
> 
> 2a. This would allow us to avoid re-init of driver model in most cases
> I think. E.g. we could init serial and timer before relocation and
> leave them inited after relocation. We could just init the
> 'additional' devices not done before relocation.
> 
> 2b. I suppose we could even extend this to SPL if we wanted to. I
> suspect it would just be a pain though, since SPL might use memory
> that U-Boot wants.
> 
> 3. We could turn on the cache earlier. This removes most of the
> boot-time penalty. Ideally this should be turned on in SPL and perhaps
> redone in U-Boot which has more memory available. If SPL is not used,
> we could turn on the cache before relocation.

Both turning on the cache and initialising the clocking could be of benefit
to boot-time.

However, the biggest possible gain will come from utilising Falcon mode
to skip the full U-Boot stage and directly boot into the OS from SPL.  This
assumes that the drivers involved are fully optimised, so loading up the
OS image does not take longer than necessary.

> 4. Rather than the reserving memory in board_init_f() we could have it
> call malloc() from the expanded region. We could then perhaps then
> move this reserve/allocate code in to particular drivers or
> subsystems, and drop a good chunk of the init sequence. We would need
> to have a larger malloc() region than is currently the case.
> 
> There are still some arch-specific bits in board_init_f() which make
> these sorts of changes a bit tricky to support generically. IMO it
> would be best to move to 'generic relocation' written in C, where all
> archs work basically the same way, before attempting any of the above.
> 
> Still, I can see some benefits and even some simplifications.
> 
> Regards,
> Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-25 23:31         ` Dr. Philipp Tomsich
@ 2017-11-26 11:38           ` Simon Glass
  2017-11-26 13:44             ` Dr. Philipp Tomsich
  2017-11-26 14:16             ` Masahiro Yamada
  0 siblings, 2 replies; 26+ messages in thread
From: Simon Glass @ 2017-11-26 11:38 UTC (permalink / raw)
  To: u-boot

Hi Philipp,

On 25 November 2017 at 16:31, Dr. Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
> Hi,
>
>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>
>> +Tom, Masahiro, Philipp
>>
>> Hi,
>>
>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>> Dear Kever Yang,
>>>
>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>
>>>> I can understand this feature, we always do dram_init_banks() first,
>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>> I believe there must be some historical reason for some kind of device,
>>>> the relocate feature is a wonderful idea for it.
>>>
>>> This is actuallyu not so much a feature needed to support some
>>> specific device (in this case much simpler approahces would be
>>> possible), but to support a whole set of features.  Unfortunately
>>> these appear to get forgotten / ignored over time.
>>>
>>>>     many other SoCs should be similar.
>>>> - Without relocate we can save many step, some of our customer really
>>>>     care much about the boot time duration.
>>>>     * no need to relocate everything
>>>>     * no need to copy all the code
>>>>     * no need init the driver more than once
>>>
>>> Please have a look at the README, section "Memory Management".
>>> The reloaction is not done to any _fixed_ address, but the address
>>> is actually computed at runtime, depending on a number features
>>> enabled (at least this is how it used to be - appearently little of
>>> this is tested on a regular base, so I would not be surprised if
>>> things are broken today).
>>>
>>> The basic idea was to reserve areas of memory at the top of RAM,
>>> that would not be initialized / modified by U-Boot and Linux, not
>>> even across a reset / warm boot.
>>>
>>> This was used for exaple for:
>>>
>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>  (for example, using a pramfs [Protected and Persistent RAM
>>>  Filesystem]) that could be kept across reboots of the OS.
>>>
>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>  to initialize the video memory just once (in U-Boot) and then
>>>  share it, maybe even across reboots.  especially, this would allow
>>>  for a very early splash screen that gets passed (flicker free) to
>>>  Linux until some Linux GUI takes over (much more difficult today).
>>>
>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>>  this allows for example to
>>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>>    is very useful when you bring up a new system and Linux crashes
>>>    before it can display the log buffer on the console
>>>  * pass U-Boot POST results on to Linux, so the application code
>>>    can read and process these
>>>  * process the system log of the previous run (especially after a
>>>    panic) in Lunux after it rebootet.
>>>
>>> etc.
>>>
>>> There are a number of such features which require to reserve room at
>>> the top of RAM, the size of which is calculatedat runtime, often
>>> depending on user settable environment data.
>>>
>>> All this cannot be done without relocation to a (dynmaically
>>> computed) target address.
>>>
>>>
>>> Yes, the code could be simpler and faster without that - but then,
>>> you cut off a number of features.
>>
>> I would be interested in seeing benchmarks showing the cost of
>> relocation in terms of boot time. Last time I did this was on Exynos 5
>> and it was some years ago. The time was pretty small provided the
>> cache was on for the memory copies associated with relocation itself.
>> Something like 10-20ms but I don't have the numbers handy.
>>
>> I think it is useful to be able to allocate memory in board_init_f()
>> for use by U-Boot for things like the display and the malloc() region.
>>
>> Options we might consider:
>>
>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>> used when U-Boot runs as an EFI app
>>
>> 2. Rather than throwing away the old malloc() region, keep it around
>> so existing allocated blocks work. Then new malloc() region would be
>> used for future allocations. We could perhaps ignore free() calls in
>> that region
>>
>> 2a. This would allow us to avoid re-init of driver model in most cases
>> I think. E.g. we could init serial and timer before relocation and
>> leave them inited after relocation. We could just init the
>> 'additional' devices not done before relocation.
>>
>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>> suspect it would just be a pain though, since SPL might use memory
>> that U-Boot wants.
>>
>> 3. We could turn on the cache earlier. This removes most of the
>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>> redone in U-Boot which has more memory available. If SPL is not used,
>> we could turn on the cache before relocation.
>
> Both turning on the cache and initialising the clocking could be of benefit
> to boot-time.
>
> However, the biggest possible gain will come from utilising Falcon mode
> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
> assumes that the drivers involved are fully optimised, so loading up the
> OS image does not take longer than necessary.

I'd like to see numbers on that. From my experience, loading and
running U-Boot does not take very long...

>
>> 4. Rather than the reserving memory in board_init_f() we could have it
>> call malloc() from the expanded region. We could then perhaps then
>> move this reserve/allocate code in to particular drivers or
>> subsystems, and drop a good chunk of the init sequence. We would need
>> to have a larger malloc() region than is currently the case.
>>
>> There are still some arch-specific bits in board_init_f() which make
>> these sorts of changes a bit tricky to support generically. IMO it
>> would be best to move to 'generic relocation' written in C, where all
>> archs work basically the same way, before attempting any of the above.
>>
>> Still, I can see some benefits and even some simplifications.
>>
>> Regards,
>> Simon
>

Regards,
Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 11:38           ` Simon Glass
@ 2017-11-26 13:44             ` Dr. Philipp Tomsich
  2017-11-26 13:49               ` Dr. Philipp Tomsich
  2017-11-26 14:16             ` Masahiro Yamada
  1 sibling, 1 reply; 26+ messages in thread
From: Dr. Philipp Tomsich @ 2017-11-26 13:44 UTC (permalink / raw)
  To: u-boot


> On 26 Nov 2017, at 12:38, Simon Glass <sjg@chromium.org> wrote:
> 
> Hi Philipp,
> 
> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> <philipp.tomsich at theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>> wrote:
>> Hi,
>> 
>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>> 
>>> +Tom, Masahiro, Philipp
>>> 
>>> Hi,
>>> 
>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>>> Dear Kever Yang,
>>>> 
>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>> 
>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>> I believe there must be some historical reason for some kind of device,
>>>>> the relocate feature is a wonderful idea for it.
>>>> 
>>>> This is actuallyu not so much a feature needed to support some
>>>> specific device (in this case much simpler approahces would be
>>>> possible), but to support a whole set of features.  Unfortunately
>>>> these appear to get forgotten / ignored over time.
>>>> 
>>>>>    many other SoCs should be similar.
>>>>> - Without relocate we can save many step, some of our customer really
>>>>>    care much about the boot time duration.
>>>>>    * no need to relocate everything
>>>>>    * no need to copy all the code
>>>>>    * no need init the driver more than once
>>>> 
>>>> Please have a look at the README, section "Memory Management".
>>>> The reloaction is not done to any _fixed_ address, but the address
>>>> is actually computed at runtime, depending on a number features
>>>> enabled (at least this is how it used to be - appearently little of
>>>> this is tested on a regular base, so I would not be surprised if
>>>> things are broken today).
>>>> 
>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>> even across a reset / warm boot.
>>>> 
>>>> This was used for exaple for:
>>>> 
>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>> (for example, using a pramfs [Protected and Persistent RAM
>>>> Filesystem]) that could be kept across reboots of the OS.
>>>> 
>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>> to initialize the video memory just once (in U-Boot) and then
>>>> share it, maybe even across reboots.  especially, this would allow
>>>> for a very early splash screen that gets passed (flicker free) to
>>>> Linux until some Linux GUI takes over (much more difficult today).
>>>> 
>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>> buffer mechanism, so you could share it between U-Boot and Linux.
>>>> this allows for example to
>>>> * read the Linux kernel panic messages after reset in U-Boot; this
>>>>   is very useful when you bring up a new system and Linux crashes
>>>>   before it can display the log buffer on the console
>>>> * pass U-Boot POST results on to Linux, so the application code
>>>>   can read and process these
>>>> * process the system log of the previous run (especially after a
>>>>   panic) in Lunux after it rebootet.
>>>> 
>>>> etc.
>>>> 
>>>> There are a number of such features which require to reserve room at
>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>> depending on user settable environment data.
>>>> 
>>>> All this cannot be done without relocation to a (dynmaically
>>>> computed) target address.
>>>> 
>>>> 
>>>> Yes, the code could be simpler and faster without that - but then,
>>>> you cut off a number of features.
>>> 
>>> I would be interested in seeing benchmarks showing the cost of
>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>> and it was some years ago. The time was pretty small provided the
>>> cache was on for the memory copies associated with relocation itself.
>>> Something like 10-20ms but I don't have the numbers handy.
>>> 
>>> I think it is useful to be able to allocate memory in board_init_f()
>>> for use by U-Boot for things like the display and the malloc() region.
>>> 
>>> Options we might consider:
>>> 
>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>> used when U-Boot runs as an EFI app
>>> 
>>> 2. Rather than throwing away the old malloc() region, keep it around
>>> so existing allocated blocks work. Then new malloc() region would be
>>> used for future allocations. We could perhaps ignore free() calls in
>>> that region
>>> 
>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>> I think. E.g. we could init serial and timer before relocation and
>>> leave them inited after relocation. We could just init the
>>> 'additional' devices not done before relocation.
>>> 
>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>> suspect it would just be a pain though, since SPL might use memory
>>> that U-Boot wants.
>>> 
>>> 3. We could turn on the cache earlier. This removes most of the
>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>> redone in U-Boot which has more memory available. If SPL is not used,
>>> we could turn on the cache before relocation.
>> 
>> Both turning on the cache and initialising the clocking could be of benefit
>> to boot-time.
>> 
>> However, the biggest possible gain will come from utilising Falcon mode
>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>> assumes that the drivers involved are fully optimised, so loading up the
>> OS image does not take longer than necessary.
> 
> I'd like to see numbers on that. From my experience, loading and
> running U-Boot does not take very long…

I was referring to the OS images, not to U-Boot itself.
While U-Boot will less than 512KB, a typical kernel image will be a handful
of MB… plus there may be a few MB of ramdisk to accompany it.

>> 
>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>> call malloc() from the expanded region. We could then perhaps then
>>> move this reserve/allocate code in to particular drivers or
>>> subsystems, and drop a good chunk of the init sequence. We would need
>>> to have a larger malloc() region than is currently the case.
>>> 
>>> There are still some arch-specific bits in board_init_f() which make
>>> these sorts of changes a bit tricky to support generically. IMO it
>>> would be best to move to 'generic relocation' written in C, where all
>>> archs work basically the same way, before attempting any of the above.
>>> 
>>> Still, I can see some benefits and even some simplifications.
>>> 
>>> Regards,
>>> Simon
>> 
> 
> Regards,
> Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 13:44             ` Dr. Philipp Tomsich
@ 2017-11-26 13:49               ` Dr. Philipp Tomsich
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. Philipp Tomsich @ 2017-11-26 13:49 UTC (permalink / raw)
  To: u-boot


> On 26 Nov 2017, at 14:44, Dr. Philipp Tomsich <philipp.tomsich@theobroma-systems.com> wrote:
> 
> 
>> On 26 Nov 2017, at 12:38, Simon Glass <sjg@chromium.org> wrote:
>> 
>> Hi Philipp,
>> 
>> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> <philipp.tomsich at theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>> wrote:
>>> Hi,
>>> 
>>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>>> 
>>>> +Tom, Masahiro, Philipp
>>>> 
>>>> Hi,
>>>> 
>>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>>>> Dear Kever Yang,
>>>>> 
>>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>>> 
>>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>>> I believe there must be some historical reason for some kind of device,
>>>>>> the relocate feature is a wonderful idea for it.
>>>>> 
>>>>> This is actuallyu not so much a feature needed to support some
>>>>> specific device (in this case much simpler approahces would be
>>>>> possible), but to support a whole set of features.  Unfortunately
>>>>> these appear to get forgotten / ignored over time.
>>>>> 
>>>>>>   many other SoCs should be similar.
>>>>>> - Without relocate we can save many step, some of our customer really
>>>>>>   care much about the boot time duration.
>>>>>>   * no need to relocate everything
>>>>>>   * no need to copy all the code
>>>>>>   * no need init the driver more than once
>>>>> 
>>>>> Please have a look at the README, section "Memory Management".
>>>>> The reloaction is not done to any _fixed_ address, but the address
>>>>> is actually computed at runtime, depending on a number features
>>>>> enabled (at least this is how it used to be - appearently little of
>>>>> this is tested on a regular base, so I would not be surprised if
>>>>> things are broken today).
>>>>> 
>>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>>> even across a reset / warm boot.
>>>>> 
>>>>> This was used for exaple for:
>>>>> 
>>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>>> (for example, using a pramfs [Protected and Persistent RAM
>>>>> Filesystem]) that could be kept across reboots of the OS.
>>>>> 
>>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>>> to initialize the video memory just once (in U-Boot) and then
>>>>> share it, maybe even across reboots.  especially, this would allow
>>>>> for a very early splash screen that gets passed (flicker free) to
>>>>> Linux until some Linux GUI takes over (much more difficult today).
>>>>> 
>>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>>> buffer mechanism, so you could share it between U-Boot and Linux.
>>>>> this allows for example to
>>>>> * read the Linux kernel panic messages after reset in U-Boot; this
>>>>>  is very useful when you bring up a new system and Linux crashes
>>>>>  before it can display the log buffer on the console
>>>>> * pass U-Boot POST results on to Linux, so the application code
>>>>>  can read and process these
>>>>> * process the system log of the previous run (especially after a
>>>>>  panic) in Lunux after it rebootet.
>>>>> 
>>>>> etc.
>>>>> 
>>>>> There are a number of such features which require to reserve room at
>>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>>> depending on user settable environment data.
>>>>> 
>>>>> All this cannot be done without relocation to a (dynmaically
>>>>> computed) target address.
>>>>> 
>>>>> 
>>>>> Yes, the code could be simpler and faster without that - but then,
>>>>> you cut off a number of features.
>>>> 
>>>> I would be interested in seeing benchmarks showing the cost of
>>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>>> and it was some years ago. The time was pretty small provided the
>>>> cache was on for the memory copies associated with relocation itself.
>>>> Something like 10-20ms but I don't have the numbers handy.
>>>> 
>>>> I think it is useful to be able to allocate memory in board_init_f()
>>>> for use by U-Boot for things like the display and the malloc() region.
>>>> 
>>>> Options we might consider:
>>>> 
>>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>>> used when U-Boot runs as an EFI app
>>>> 
>>>> 2. Rather than throwing away the old malloc() region, keep it around
>>>> so existing allocated blocks work. Then new malloc() region would be
>>>> used for future allocations. We could perhaps ignore free() calls in
>>>> that region
>>>> 
>>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>>> I think. E.g. we could init serial and timer before relocation and
>>>> leave them inited after relocation. We could just init the
>>>> 'additional' devices not done before relocation.
>>>> 
>>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>>> suspect it would just be a pain though, since SPL might use memory
>>>> that U-Boot wants.
>>>> 
>>>> 3. We could turn on the cache earlier. This removes most of the
>>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>>> redone in U-Boot which has more memory available. If SPL is not used,
>>>> we could turn on the cache before relocation.
>>> 
>>> Both turning on the cache and initialising the clocking could be of benefit
>>> to boot-time.
>>> 
>>> However, the biggest possible gain will come from utilising Falcon mode
>>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>>> assumes that the drivers involved are fully optimised, so loading up the
>>> OS image does not take longer than necessary.
>> 
>> I'd like to see numbers on that. From my experience, loading and
>> running U-Boot does not take very long…
> 
> I was referring to the OS images, not to U-Boot itself.
> While U-Boot will less than 512KB, a typical kernel image will be a handful
> of MB… plus there may be a few MB of ramdisk to accompany it.

And here’s some numbers from an OS image (using distroboot) boot from MMC.
As indicated, loading the OS is much more expensive than anything else; and
this is mainly due to the drivers in U-Boot not being able to operate the MMC
at the full extent of its capabilities (I know that changes for this have been
submitted, but I didn’t have a chance to test this out yet).

So here’s the numbers for some perspective:

> Hit any key to stop autoboot:  0 
> switch to partitions #0, OK
> mmc0(part 0) is current device
> Scanning mmc 0:1...
> Found U-Boot script /boot/boot.scr
> 2098 bytes read in 95 ms (21.5 KiB/s)
> ## Executing script at 00500000
> Boot script running from mmc 0
> 86 bytes read in 105 ms (0 Bytes/s)
> Import default environment from /boot/puma_rk3399/defaultEnv.txt
> 0 bytes read in 108 ms (0 Bytes/s)
> Import default environment from /boot/puma_rk3399/userEnv.txt
> 62742 bytes read in 127 ms (482.4 KiB/s)
> Load devicetree from /boot/puma_rk3399/rk3399-puma.dtb
> 16427016 bytes read in 1694 ms (9.2 MiB/s)
> ** File not found /boot/puma_rk3399/uInitrd **
> Start Kernel without initrd


> 
>>> 
>>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>>> call malloc() from the expanded region. We could then perhaps then
>>>> move this reserve/allocate code in to particular drivers or
>>>> subsystems, and drop a good chunk of the init sequence. We would need
>>>> to have a larger malloc() region than is currently the case.
>>>> 
>>>> There are still some arch-specific bits in board_init_f() which make
>>>> these sorts of changes a bit tricky to support generically. IMO it
>>>> would be best to move to 'generic relocation' written in C, where all
>>>> archs work basically the same way, before attempting any of the above.
>>>> 
>>>> Still, I can see some benefits and even some simplifications.
>>>> 
>>>> Regards,
>>>> Simon
>>> 
>> 
>> Regards,
>> Simon
> 
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-21  9:33 [U-Boot] U-Boot proper(not SPL) relocate option Kever Yang
  2017-11-21 10:29 ` Lukasz Majewski
@ 2017-11-26 14:04 ` Andreas Färber
  1 sibling, 0 replies; 26+ messages in thread
From: Andreas Färber @ 2017-11-26 14:04 UTC (permalink / raw)
  To: u-boot

Hi Kever,

Am 21.11.2017 um 10:33 schrieb Kever Yang:
>     I try to understand why we need to do the relocate in U-Boot.
> From the document README/crt0.S, I think the relocation feature comes
> from some SoC have limited SRAM whose size is enough to load the whole
> U-Boot, but not enough to run all the drivers.
> 
>     I don't know how many SoCs/Archs still must use this feature, but
> I'm sure all
> Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
> because rockchip using SPL always running in SRAM to init DDR SDRAM,
> and after DRAM available always running U-Boot in DRAM.

In addition to what others have commented, chain-loading a development
(proper) U-Boot from (proper) U-Boot is a nice feature on aarch64 SoCs.
It relies on U-Boot being able to start executing from low memory, where
it does not conflict with the U-Boot in high memory calling it.

Blocking this for all Rockchip SoCs just for the sake of saving some
memory in SDRAM/storage does not sound appealing.

That does not affect SPL of course.

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 11:38           ` Simon Glass
  2017-11-26 13:44             ` Dr. Philipp Tomsich
@ 2017-11-26 14:16             ` Masahiro Yamada
  2017-11-27 13:21               ` Wolfgang Denk
                                 ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Masahiro Yamada @ 2017-11-26 14:16 UTC (permalink / raw)
  To: u-boot

2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
> Hi Philipp,
>
> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> <philipp.tomsich@theobroma-systems.com> wrote:
>> Hi,
>>
>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>>
>>> +Tom, Masahiro, Philipp
>>>
>>> Hi,
>>>
>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>>> Dear Kever Yang,
>>>>
>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>>
>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>> I believe there must be some historical reason for some kind of device,
>>>>> the relocate feature is a wonderful idea for it.
>>>>
>>>> This is actuallyu not so much a feature needed to support some
>>>> specific device (in this case much simpler approahces would be
>>>> possible), but to support a whole set of features.  Unfortunately
>>>> these appear to get forgotten / ignored over time.
>>>>
>>>>>     many other SoCs should be similar.
>>>>> - Without relocate we can save many step, some of our customer really
>>>>>     care much about the boot time duration.
>>>>>     * no need to relocate everything
>>>>>     * no need to copy all the code
>>>>>     * no need init the driver more than once
>>>>
>>>> Please have a look at the README, section "Memory Management".
>>>> The reloaction is not done to any _fixed_ address, but the address
>>>> is actually computed at runtime, depending on a number features
>>>> enabled (at least this is how it used to be - appearently little of
>>>> this is tested on a regular base, so I would not be surprised if
>>>> things are broken today).
>>>>
>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>> even across a reset / warm boot.
>>>>
>>>> This was used for exaple for:
>>>>
>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>>  (for example, using a pramfs [Protected and Persistent RAM
>>>>  Filesystem]) that could be kept across reboots of the OS.
>>>>
>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>>  to initialize the video memory just once (in U-Boot) and then
>>>>  share it, maybe even across reboots.  especially, this would allow
>>>>  for a very early splash screen that gets passed (flicker free) to
>>>>  Linux until some Linux GUI takes over (much more difficult today).
>>>>
>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>>>  this allows for example to
>>>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>>>    is very useful when you bring up a new system and Linux crashes
>>>>    before it can display the log buffer on the console
>>>>  * pass U-Boot POST results on to Linux, so the application code
>>>>    can read and process these
>>>>  * process the system log of the previous run (especially after a
>>>>    panic) in Lunux after it rebootet.
>>>>
>>>> etc.
>>>>
>>>> There are a number of such features which require to reserve room at
>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>> depending on user settable environment data.
>>>>
>>>> All this cannot be done without relocation to a (dynmaically
>>>> computed) target address.
>>>>
>>>>
>>>> Yes, the code could be simpler and faster without that - but then,
>>>> you cut off a number of features.
>>>
>>> I would be interested in seeing benchmarks showing the cost of
>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>> and it was some years ago. The time was pretty small provided the
>>> cache was on for the memory copies associated with relocation itself.
>>> Something like 10-20ms but I don't have the numbers handy.
>>>
>>> I think it is useful to be able to allocate memory in board_init_f()
>>> for use by U-Boot for things like the display and the malloc() region.
>>>
>>> Options we might consider:
>>>
>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>> used when U-Boot runs as an EFI app
>>>
>>> 2. Rather than throwing away the old malloc() region, keep it around
>>> so existing allocated blocks work. Then new malloc() region would be
>>> used for future allocations. We could perhaps ignore free() calls in
>>> that region
>>>
>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>> I think. E.g. we could init serial and timer before relocation and
>>> leave them inited after relocation. We could just init the
>>> 'additional' devices not done before relocation.
>>>
>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>> suspect it would just be a pain though, since SPL might use memory
>>> that U-Boot wants.
>>>
>>> 3. We could turn on the cache earlier. This removes most of the
>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>> redone in U-Boot which has more memory available. If SPL is not used,
>>> we could turn on the cache before relocation.
>>
>> Both turning on the cache and initialising the clocking could be of benefit
>> to boot-time.
>>
>> However, the biggest possible gain will come from utilising Falcon mode
>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>> assumes that the drivers involved are fully optimised, so loading up the
>> OS image does not take longer than necessary.
>
> I'd like to see numbers on that. From my experience, loading and
> running U-Boot does not take very long...
>
>>
>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>> call malloc() from the expanded region. We could then perhaps then
>>> move this reserve/allocate code in to particular drivers or
>>> subsystems, and drop a good chunk of the init sequence. We would need
>>> to have a larger malloc() region than is currently the case.
>>>
>>> There are still some arch-specific bits in board_init_f() which make
>>> these sorts of changes a bit tricky to support generically. IMO it
>>> would be best to move to 'generic relocation' written in C, where all
>>> archs work basically the same way, before attempting any of the above.
>>>
>>> Still, I can see some benefits and even some simplifications.
>>>
>>> Regards,
>>> Simon
>>



This discussion should have happened.
U-Boot boot sequence is crazily inefficient.



When we talk about "relocation", two things are happening.

 [1] U-Boot proper copies itself to the very end of DRAM
 [2] Fix-up the global symbols

In my opinion, only [2] is useful.


SPL initializes the DRAM, so it knows the base and size of DRAM.
SPL should be able to load the U-Boot proper to the final destination.
So, [1] is unnecessary.


[2] is necessary because SPL may load the U-Boot proper
to a different place than CONFIG_SYS_TEXT_BASE.
This feature is useful for platforms
whose DRAM base/size is only known at run-time.
(Of course, it should be user-configurable by CONFIG_RELOCATE
or something.)

Moreover, board_init_f() is unneeded -
everything in board_init_f() is already done by SPL.
Multiple-time DM initialization is really inefficient and ugly.


The following is how the ideal boot loader would work.


Requirement for U-Boot proper:
U-Boot never changes the location by itself.
So, SPL or a vendor loader must load U-Boot proper
to the final destination directly.
(You can load it to the very end of DRAM if you like,
but the actual place does not matter here.)


Boot sequence of U-Boot proper:
If CONFIG_RELOCATE (or something) is enabled,
it fixes the global symbols at the very beginning
of the boot.
(In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)

That's it.  Proceed to the rest of init code.
(= board_init_r)
board_init_f() is unnecessary.

This should work for recent platforms.



We should think about old platforms that boot from a NOR flash or something.
There are two solutions:
 - execute-in-place: run the code in the flash directly
 - use SPL (common/spl/spl-nor.c) if you want to run
   it from RAM




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 14:16             ` Masahiro Yamada
@ 2017-11-27 13:21               ` Wolfgang Denk
  2017-11-29 10:10                 ` Masahiro Yamada
  2017-11-27 17:13               ` Simon Glass
  2017-11-27 18:52               ` Tom Rini
  2 siblings, 1 reply; 26+ messages in thread
From: Wolfgang Denk @ 2017-11-27 13:21 UTC (permalink / raw)
  To: u-boot

Dear Masahiro,

In message <CAK7LNARc-MUPRmHSiSjtea9X6hpRAe9t9wgdbdfg2DYSEpC4RA@mail.gmail.com> you wrote:
>
> When we talk about "relocation", two things are happening.
> 
>  [1] U-Boot proper copies itself to the very end of DRAM

...to the very end of DRAM minus space reserved for any memory
regions we want to reserver for / share with the Linux kernel
(video memory, log buffer, protected RAM, ...)

>  [2] Fix-up the global symbols
> 
> In my opinion, only [2] is useful.

This is your opinion, accepted.
I do not agree with this, i. e. I have a different opinion.

> SPL initializes the DRAM, so it knows the base and size of DRAM.

But it does not know the relocation address yet.  As this is
dynamically computed, depending on environment variable settings,
moving this calculation into the SPL means the SPL must be capable to
read the environment.  this pulls in a ton of code, and any
advantages you may have for falcon mode are damaged.

> The following is how the ideal boot loader would work.
> 
> 
> Requirement for U-Boot proper:
> U-Boot never changes the location by itself.

This means you kill all the features that depend on this?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
f u cn rd ths, itn tyg h myxbl cd.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 14:16             ` Masahiro Yamada
  2017-11-27 13:21               ` Wolfgang Denk
@ 2017-11-27 17:13               ` Simon Glass
  2017-11-27 18:53                 ` Tom Rini
                                   ` (2 more replies)
  2017-11-27 18:52               ` Tom Rini
  2 siblings, 3 replies; 26+ messages in thread
From: Simon Glass @ 2017-11-27 17:13 UTC (permalink / raw)
  To: u-boot

(Tom - any thoughts about a more expansive cc list on this?)

Hi Masahiro,

On 26 November 2017 at 07:16, Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
>> Hi Philipp,
>>
>> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> <philipp.tomsich@theobroma-systems.com> wrote:
>>> Hi,
>>>
>>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>>>
>>>> +Tom, Masahiro, Philipp
>>>>
>>>> Hi,
>>>>
>>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>>>> Dear Kever Yang,
>>>>>
>>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>>>
>>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>>> I believe there must be some historical reason for some kind of device,
>>>>>> the relocate feature is a wonderful idea for it.
>>>>>
>>>>> This is actuallyu not so much a feature needed to support some
>>>>> specific device (in this case much simpler approahces would be
>>>>> possible), but to support a whole set of features.  Unfortunately
>>>>> these appear to get forgotten / ignored over time.
>>>>>
>>>>>>     many other SoCs should be similar.
>>>>>> - Without relocate we can save many step, some of our customer really
>>>>>>     care much about the boot time duration.
>>>>>>     * no need to relocate everything
>>>>>>     * no need to copy all the code
>>>>>>     * no need init the driver more than once
>>>>>
>>>>> Please have a look at the README, section "Memory Management".
>>>>> The reloaction is not done to any _fixed_ address, but the address
>>>>> is actually computed at runtime, depending on a number features
>>>>> enabled (at least this is how it used to be - appearently little of
>>>>> this is tested on a regular base, so I would not be surprised if
>>>>> things are broken today).
>>>>>
>>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>>> even across a reset / warm boot.
>>>>>
>>>>> This was used for exaple for:
>>>>>
>>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>>>  (for example, using a pramfs [Protected and Persistent RAM
>>>>>  Filesystem]) that could be kept across reboots of the OS.
>>>>>
>>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>>>  to initialize the video memory just once (in U-Boot) and then
>>>>>  share it, maybe even across reboots.  especially, this would allow
>>>>>  for a very early splash screen that gets passed (flicker free) to
>>>>>  Linux until some Linux GUI takes over (much more difficult today).
>>>>>
>>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>>>>  this allows for example to
>>>>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>>>>    is very useful when you bring up a new system and Linux crashes
>>>>>    before it can display the log buffer on the console
>>>>>  * pass U-Boot POST results on to Linux, so the application code
>>>>>    can read and process these
>>>>>  * process the system log of the previous run (especially after a
>>>>>    panic) in Lunux after it rebootet.
>>>>>
>>>>> etc.
>>>>>
>>>>> There are a number of such features which require to reserve room at
>>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>>> depending on user settable environment data.
>>>>>
>>>>> All this cannot be done without relocation to a (dynmaically
>>>>> computed) target address.
>>>>>
>>>>>
>>>>> Yes, the code could be simpler and faster without that - but then,
>>>>> you cut off a number of features.
>>>>
>>>> I would be interested in seeing benchmarks showing the cost of
>>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>>> and it was some years ago. The time was pretty small provided the
>>>> cache was on for the memory copies associated with relocation itself.
>>>> Something like 10-20ms but I don't have the numbers handy.
>>>>
>>>> I think it is useful to be able to allocate memory in board_init_f()
>>>> for use by U-Boot for things like the display and the malloc() region.
>>>>
>>>> Options we might consider:
>>>>
>>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>>> used when U-Boot runs as an EFI app
>>>>
>>>> 2. Rather than throwing away the old malloc() region, keep it around
>>>> so existing allocated blocks work. Then new malloc() region would be
>>>> used for future allocations. We could perhaps ignore free() calls in
>>>> that region
>>>>
>>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>>> I think. E.g. we could init serial and timer before relocation and
>>>> leave them inited after relocation. We could just init the
>>>> 'additional' devices not done before relocation.
>>>>
>>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>>> suspect it would just be a pain though, since SPL might use memory
>>>> that U-Boot wants.
>>>>
>>>> 3. We could turn on the cache earlier. This removes most of the
>>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>>> redone in U-Boot which has more memory available. If SPL is not used,
>>>> we could turn on the cache before relocation.
>>>
>>> Both turning on the cache and initialising the clocking could be of benefit
>>> to boot-time.
>>>
>>> However, the biggest possible gain will come from utilising Falcon mode
>>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>>> assumes that the drivers involved are fully optimised, so loading up the
>>> OS image does not take longer than necessary.
>>
>> I'd like to see numbers on that. From my experience, loading and
>> running U-Boot does not take very long...
>>
>>>
>>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>>> call malloc() from the expanded region. We could then perhaps then
>>>> move this reserve/allocate code in to particular drivers or
>>>> subsystems, and drop a good chunk of the init sequence. We would need
>>>> to have a larger malloc() region than is currently the case.
>>>>
>>>> There are still some arch-specific bits in board_init_f() which make
>>>> these sorts of changes a bit tricky to support generically. IMO it
>>>> would be best to move to 'generic relocation' written in C, where all
>>>> archs work basically the same way, before attempting any of the above.
>>>>
>>>> Still, I can see some benefits and even some simplifications.
>>>>
>>>> Regards,
>>>> Simon
>>>
>
>
>
> This discussion should have happened.
> U-Boot boot sequence is crazily inefficient.
>
>
>
> When we talk about "relocation", two things are happening.
>
>  [1] U-Boot proper copies itself to the very end of DRAM
>  [2] Fix-up the global symbols
>
> In my opinion, only [2] is useful.
>
>
> SPL initializes the DRAM, so it knows the base and size of DRAM.
> SPL should be able to load the U-Boot proper to the final destination.
> So, [1] is unnecessary.
>
>
> [2] is necessary because SPL may load the U-Boot proper
> to a different place than CONFIG_SYS_TEXT_BASE.
> This feature is useful for platforms
> whose DRAM base/size is only known at run-time.
> (Of course, it should be user-configurable by CONFIG_RELOCATE
> or something.)
>
> Moreover, board_init_f() is unneeded -
> everything in board_init_f() is already done by SPL.
> Multiple-time DM initialization is really inefficient and ugly.
>
>
> The following is how the ideal boot loader would work.
>
>
> Requirement for U-Boot proper:
> U-Boot never changes the location by itself.
> So, SPL or a vendor loader must load U-Boot proper
> to the final destination directly.
> (You can load it to the very end of DRAM if you like,
> but the actual place does not matter here.)
>
>
> Boot sequence of U-Boot proper:
> If CONFIG_RELOCATE (or something) is enabled,
> it fixes the global symbols at the very beginning
> of the boot.
> (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>
> That's it.  Proceed to the rest of init code.
> (= board_init_r)
> board_init_f() is unnecessary.
>
> This should work for recent platforms.

Yes that sounds reasonable to me.

We could do the symbol fixup/relocation in SPL after loading U-Boot.,
although that would probably push us to using ELF format for U-Boot
which is a bit limited.

Still I think the biggest performance improvement comes from turning
on the cache in SPL. So the above is a simplification, not really a
speed-up.

>
>
>
> We should think about old platforms that boot from a NOR flash or something.
> There are two solutions:
>  - execute-in-place: run the code in the flash directly
>  - use SPL (common/spl/spl-nor.c) if you want to run
>    it from RAM

This seems like a big regression in functionality. For example for x86
32-bit we currently don't have an SPL (we do for 64-bit). So I think
this means that everything would be forced to have an SPL?

I am wondering who else we should cc on this discussion?

Regards,
Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-26 14:16             ` Masahiro Yamada
  2017-11-27 13:21               ` Wolfgang Denk
  2017-11-27 17:13               ` Simon Glass
@ 2017-11-27 18:52               ` Tom Rini
  2 siblings, 0 replies; 26+ messages in thread
From: Tom Rini @ 2017-11-27 18:52 UTC (permalink / raw)
  To: u-boot

On Sun, Nov 26, 2017 at 11:16:45PM +0900, Masahiro Yamada wrote:
[snip]
> This discussion should have happened.
> U-Boot boot sequence is crazily inefficient.
> 
> 
> 
> When we talk about "relocation", two things are happening.
> 
>  [1] U-Boot proper copies itself to the very end of DRAM
>  [2] Fix-up the global symbols
> 
> In my opinion, only [2] is useful.
> 
> 
> SPL initializes the DRAM, so it knows the base and size of DRAM.
> SPL should be able to load the U-Boot proper to the final destination.
> So, [1] is unnecessary.

Knowing this final destination isn't necessarily easy in all cases.  One
thing to keep in mind here is that long long ago, U-Boot did not do this
relocation step.  But that was also well before SPL, so some level of
what was made easier with relocation isn't so necessary now.

It's also somewhat of an important safety feature.  We have a lot of
values that get re-used (and sometimes re-based) without sufficient
care.  Take for example where for the longest time nearly everyone on
ARM32 was loading the kernel to.  Having U-Boot automatically end up way
out of the way rather than hoping everyone calculates a good address
that won't get stepped on is important.  It's also one of those things
that will change over time as features get added / changed and our
footprint grows.  We're already fairly often talking about "oops, what
do we do now to keep X into size constraint of $Y storage?".  It'll be
even worse to deal with "oops, adding $X means we need more run-time
space".

All of that said, I'd be happy to see logs showing that we in fact spend
a measurable amount of time in relocation and what we can do about it.

Thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171127/32f70783/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-27 17:13               ` Simon Glass
@ 2017-11-27 18:53                 ` Tom Rini
  2017-11-28  9:53                 ` Lukasz Majewski
  2017-11-29 10:11                 ` Masahiro Yamada
  2 siblings, 0 replies; 26+ messages in thread
From: Tom Rini @ 2017-11-27 18:53 UTC (permalink / raw)
  To: u-boot

On Mon, Nov 27, 2017 at 10:13:09AM -0700, Simon Glass wrote:
> (Tom - any thoughts about a more expansive cc list on this?)

Not really, sorry.

> Hi Masahiro,
> 
> On 26 November 2017 at 07:16, Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
> > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
> >> Hi Philipp,
> >>
> >> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> >> <philipp.tomsich@theobroma-systems.com> wrote:
> >>> Hi,
> >>>
> >>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
> >>>>
> >>>> +Tom, Masahiro, Philipp
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
> >>>>> Dear Kever Yang,
> >>>>>
> >>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
> >>>>>>
> >>>>>> I can understand this feature, we always do dram_init_banks() first,
> >>>>>> then we relocate to 'known' area, then will be no risk to access memory.
> >>>>>> I believe there must be some historical reason for some kind of device,
> >>>>>> the relocate feature is a wonderful idea for it.
> >>>>>
> >>>>> This is actuallyu not so much a feature needed to support some
> >>>>> specific device (in this case much simpler approahces would be
> >>>>> possible), but to support a whole set of features.  Unfortunately
> >>>>> these appear to get forgotten / ignored over time.
> >>>>>
> >>>>>>     many other SoCs should be similar.
> >>>>>> - Without relocate we can save many step, some of our customer really
> >>>>>>     care much about the boot time duration.
> >>>>>>     * no need to relocate everything
> >>>>>>     * no need to copy all the code
> >>>>>>     * no need init the driver more than once
> >>>>>
> >>>>> Please have a look at the README, section "Memory Management".
> >>>>> The reloaction is not done to any _fixed_ address, but the address
> >>>>> is actually computed at runtime, depending on a number features
> >>>>> enabled (at least this is how it used to be - appearently little of
> >>>>> this is tested on a regular base, so I would not be surprised if
> >>>>> things are broken today).
> >>>>>
> >>>>> The basic idea was to reserve areas of memory at the top of RAM,
> >>>>> that would not be initialized / modified by U-Boot and Linux, not
> >>>>> even across a reset / warm boot.
> >>>>>
> >>>>> This was used for exaple for:
> >>>>>
> >>>>> - pRAM (Protected RAM) which could be used to store all kind of data
> >>>>>  (for example, using a pramfs [Protected and Persistent RAM
> >>>>>  Filesystem]) that could be kept across reboots of the OS.
> >>>>>
> >>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
> >>>>>  to initialize the video memory just once (in U-Boot) and then
> >>>>>  share it, maybe even across reboots.  especially, this would allow
> >>>>>  for a very early splash screen that gets passed (flicker free) to
> >>>>>  Linux until some Linux GUI takes over (much more difficult today).
> >>>>>
> >>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
> >>>>>  buffer mechanism, so you could share it between U-Boot and Linux.
> >>>>>  this allows for example to
> >>>>>  * read the Linux kernel panic messages after reset in U-Boot; this
> >>>>>    is very useful when you bring up a new system and Linux crashes
> >>>>>    before it can display the log buffer on the console
> >>>>>  * pass U-Boot POST results on to Linux, so the application code
> >>>>>    can read and process these
> >>>>>  * process the system log of the previous run (especially after a
> >>>>>    panic) in Lunux after it rebootet.
> >>>>>
> >>>>> etc.
> >>>>>
> >>>>> There are a number of such features which require to reserve room at
> >>>>> the top of RAM, the size of which is calculatedat runtime, often
> >>>>> depending on user settable environment data.
> >>>>>
> >>>>> All this cannot be done without relocation to a (dynmaically
> >>>>> computed) target address.
> >>>>>
> >>>>>
> >>>>> Yes, the code could be simpler and faster without that - but then,
> >>>>> you cut off a number of features.
> >>>>
> >>>> I would be interested in seeing benchmarks showing the cost of
> >>>> relocation in terms of boot time. Last time I did this was on Exynos 5
> >>>> and it was some years ago. The time was pretty small provided the
> >>>> cache was on for the memory copies associated with relocation itself.
> >>>> Something like 10-20ms but I don't have the numbers handy.
> >>>>
> >>>> I think it is useful to be able to allocate memory in board_init_f()
> >>>> for use by U-Boot for things like the display and the malloc() region.
> >>>>
> >>>> Options we might consider:
> >>>>
> >>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
> >>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
> >>>> used when U-Boot runs as an EFI app
> >>>>
> >>>> 2. Rather than throwing away the old malloc() region, keep it around
> >>>> so existing allocated blocks work. Then new malloc() region would be
> >>>> used for future allocations. We could perhaps ignore free() calls in
> >>>> that region
> >>>>
> >>>> 2a. This would allow us to avoid re-init of driver model in most cases
> >>>> I think. E.g. we could init serial and timer before relocation and
> >>>> leave them inited after relocation. We could just init the
> >>>> 'additional' devices not done before relocation.
> >>>>
> >>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
> >>>> suspect it would just be a pain though, since SPL might use memory
> >>>> that U-Boot wants.
> >>>>
> >>>> 3. We could turn on the cache earlier. This removes most of the
> >>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
> >>>> redone in U-Boot which has more memory available. If SPL is not used,
> >>>> we could turn on the cache before relocation.
> >>>
> >>> Both turning on the cache and initialising the clocking could be of benefit
> >>> to boot-time.
> >>>
> >>> However, the biggest possible gain will come from utilising Falcon mode
> >>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
> >>> assumes that the drivers involved are fully optimised, so loading up the
> >>> OS image does not take longer than necessary.
> >>
> >> I'd like to see numbers on that. From my experience, loading and
> >> running U-Boot does not take very long...
> >>
> >>>
> >>>> 4. Rather than the reserving memory in board_init_f() we could have it
> >>>> call malloc() from the expanded region. We could then perhaps then
> >>>> move this reserve/allocate code in to particular drivers or
> >>>> subsystems, and drop a good chunk of the init sequence. We would need
> >>>> to have a larger malloc() region than is currently the case.
> >>>>
> >>>> There are still some arch-specific bits in board_init_f() which make
> >>>> these sorts of changes a bit tricky to support generically. IMO it
> >>>> would be best to move to 'generic relocation' written in C, where all
> >>>> archs work basically the same way, before attempting any of the above.
> >>>>
> >>>> Still, I can see some benefits and even some simplifications.
> >>>>
> >>>> Regards,
> >>>> Simon
> >>>
> >
> >
> >
> > This discussion should have happened.
> > U-Boot boot sequence is crazily inefficient.
> >
> >
> >
> > When we talk about "relocation", two things are happening.
> >
> >  [1] U-Boot proper copies itself to the very end of DRAM
> >  [2] Fix-up the global symbols
> >
> > In my opinion, only [2] is useful.
> >
> >
> > SPL initializes the DRAM, so it knows the base and size of DRAM.
> > SPL should be able to load the U-Boot proper to the final destination.
> > So, [1] is unnecessary.
> >
> >
> > [2] is necessary because SPL may load the U-Boot proper
> > to a different place than CONFIG_SYS_TEXT_BASE.
> > This feature is useful for platforms
> > whose DRAM base/size is only known at run-time.
> > (Of course, it should be user-configurable by CONFIG_RELOCATE
> > or something.)
> >
> > Moreover, board_init_f() is unneeded -
> > everything in board_init_f() is already done by SPL.
> > Multiple-time DM initialization is really inefficient and ugly.
> >
> >
> > The following is how the ideal boot loader would work.
> >
> >
> > Requirement for U-Boot proper:
> > U-Boot never changes the location by itself.
> > So, SPL or a vendor loader must load U-Boot proper
> > to the final destination directly.
> > (You can load it to the very end of DRAM if you like,
> > but the actual place does not matter here.)
> >
> >
> > Boot sequence of U-Boot proper:
> > If CONFIG_RELOCATE (or something) is enabled,
> > it fixes the global symbols at the very beginning
> > of the boot.
> > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
> >
> > That's it.  Proceed to the rest of init code.
> > (= board_init_r)
> > board_init_f() is unnecessary.
> >
> > This should work for recent platforms.
> 
> Yes that sounds reasonable to me.
> 
> We could do the symbol fixup/relocation in SPL after loading U-Boot.,
> although that would probably push us to using ELF format for U-Boot
> which is a bit limited.
> 
> Still I think the biggest performance improvement comes from turning
> on the cache in SPL. So the above is a simplification, not really a
> speed-up.

Note that in some platforms we do enable cache in SPL, so there's no
reason not to on others, if we can afford the size.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171127/ff94dd38/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-27 17:13               ` Simon Glass
  2017-11-27 18:53                 ` Tom Rini
@ 2017-11-28  9:53                 ` Lukasz Majewski
  2017-11-28 11:30                   ` Peter Robinson
  2017-11-29 10:11                 ` Masahiro Yamada
  2 siblings, 1 reply; 26+ messages in thread
From: Lukasz Majewski @ 2017-11-28  9:53 UTC (permalink / raw)
  To: u-boot

On Mon, 27 Nov 2017 10:13:09 -0700
Simon Glass <sjg@chromium.org> wrote:

> (Tom - any thoughts about a more expansive cc list on this?)
> 
> Hi Masahiro,
> 
> On 26 November 2017 at 07:16, Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
> > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:  
> >> Hi Philipp,
> >>
> >> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> >> <philipp.tomsich@theobroma-systems.com> wrote:  
> >>> Hi,
> >>>  
> >>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
> >>>>
> >>>> +Tom, Masahiro, Philipp
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:  
> >>>>> Dear Kever Yang,
> >>>>>
> >>>>> In message
> >>>>> <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you
> >>>>> wrote:  
> >>>>>>
> >>>>>> I can understand this feature, we always do dram_init_banks()
> >>>>>> first, then we relocate to 'known' area, then will be no risk
> >>>>>> to access memory. I believe there must be some historical
> >>>>>> reason for some kind of device, the relocate feature is a
> >>>>>> wonderful idea for it.  
> >>>>>
> >>>>> This is actuallyu not so much a feature needed to support some
> >>>>> specific device (in this case much simpler approahces would be
> >>>>> possible), but to support a whole set of features.
> >>>>> Unfortunately these appear to get forgotten / ignored over time.
> >>>>>  
> >>>>>>     many other SoCs should be similar.
> >>>>>> - Without relocate we can save many step, some of our customer
> >>>>>> really care much about the boot time duration.
> >>>>>>     * no need to relocate everything
> >>>>>>     * no need to copy all the code
> >>>>>>     * no need init the driver more than once  
> >>>>>
> >>>>> Please have a look at the README, section "Memory Management".
> >>>>> The reloaction is not done to any _fixed_ address, but the
> >>>>> address is actually computed at runtime, depending on a number
> >>>>> features enabled (at least this is how it used to be -
> >>>>> appearently little of this is tested on a regular base, so I
> >>>>> would not be surprised if things are broken today).
> >>>>>
> >>>>> The basic idea was to reserve areas of memory at the top of RAM,
> >>>>> that would not be initialized / modified by U-Boot and Linux,
> >>>>> not even across a reset / warm boot.
> >>>>>
> >>>>> This was used for exaple for:
> >>>>>
> >>>>> - pRAM (Protected RAM) which could be used to store all kind of
> >>>>> data (for example, using a pramfs [Protected and Persistent RAM
> >>>>>  Filesystem]) that could be kept across reboots of the OS.
> >>>>>
> >>>>> - shared frame buffer / video memory. U-Boot and Linux would be
> >>>>> able to initialize the video memory just once (in U-Boot) and
> >>>>> then share it, maybe even across reboots.  especially, this
> >>>>> would allow for a very early splash screen that gets passed
> >>>>> (flicker free) to Linux until some Linux GUI takes over (much
> >>>>> more difficult today).
> >>>>>
> >>>>> - shared log buffer: U-Boot and Linux used to use the same
> >>>>> syslog buffer mechanism, so you could share it between U-Boot
> >>>>> and Linux. this allows for example to
> >>>>>  * read the Linux kernel panic messages after reset in U-Boot;
> >>>>> this is very useful when you bring up a new system and Linux
> >>>>> crashes before it can display the log buffer on the console
> >>>>>  * pass U-Boot POST results on to Linux, so the application code
> >>>>>    can read and process these
> >>>>>  * process the system log of the previous run (especially after
> >>>>> a panic) in Lunux after it rebootet.
> >>>>>
> >>>>> etc.
> >>>>>
> >>>>> There are a number of such features which require to reserve
> >>>>> room at the top of RAM, the size of which is calculatedat
> >>>>> runtime, often depending on user settable environment data.
> >>>>>
> >>>>> All this cannot be done without relocation to a (dynmaically
> >>>>> computed) target address.
> >>>>>
> >>>>>
> >>>>> Yes, the code could be simpler and faster without that - but
> >>>>> then, you cut off a number of features.  
> >>>>
> >>>> I would be interested in seeing benchmarks showing the cost of
> >>>> relocation in terms of boot time. Last time I did this was on
> >>>> Exynos 5 and it was some years ago. The time was pretty small
> >>>> provided the cache was on for the memory copies associated with
> >>>> relocation itself. Something like 10-20ms but I don't have the
> >>>> numbers handy.
> >>>>
> >>>> I think it is useful to be able to allocate memory in
> >>>> board_init_f() for use by U-Boot for things like the display and
> >>>> the malloc() region.
> >>>>
> >>>> Options we might consider:
> >>>>
> >>>> 1. Don't relocate the code and data. Thus we could avoid the
> >>>> copy and relocation cost. This is already supported with the
> >>>> GD_FLG_SKIP_RELOC used when U-Boot runs as an EFI app
> >>>>
> >>>> 2. Rather than throwing away the old malloc() region, keep it
> >>>> around so existing allocated blocks work. Then new malloc()
> >>>> region would be used for future allocations. We could perhaps
> >>>> ignore free() calls in that region
> >>>>
> >>>> 2a. This would allow us to avoid re-init of driver model in most
> >>>> cases I think. E.g. we could init serial and timer before
> >>>> relocation and leave them inited after relocation. We could just
> >>>> init the 'additional' devices not done before relocation.
> >>>>
> >>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
> >>>> suspect it would just be a pain though, since SPL might use
> >>>> memory that U-Boot wants.
> >>>>
> >>>> 3. We could turn on the cache earlier. This removes most of the
> >>>> boot-time penalty. Ideally this should be turned on in SPL and
> >>>> perhaps redone in U-Boot which has more memory available. If SPL
> >>>> is not used, we could turn on the cache before relocation.  
> >>>
> >>> Both turning on the cache and initialising the clocking could be
> >>> of benefit to boot-time.
> >>>
> >>> However, the biggest possible gain will come from utilising
> >>> Falcon mode to skip the full U-Boot stage and directly boot into
> >>> the OS from SPL.  This assumes that the drivers involved are
> >>> fully optimised, so loading up the OS image does not take longer
> >>> than necessary.  
> >>
> >> I'd like to see numbers on that. From my experience, loading and
> >> running U-Boot does not take very long...
> >>  
> >>>  
> >>>> 4. Rather than the reserving memory in board_init_f() we could
> >>>> have it call malloc() from the expanded region. We could then
> >>>> perhaps then move this reserve/allocate code in to particular
> >>>> drivers or subsystems, and drop a good chunk of the init
> >>>> sequence. We would need to have a larger malloc() region than is
> >>>> currently the case.
> >>>>
> >>>> There are still some arch-specific bits in board_init_f() which
> >>>> make these sorts of changes a bit tricky to support generically.
> >>>> IMO it would be best to move to 'generic relocation' written in
> >>>> C, where all archs work basically the same way, before
> >>>> attempting any of the above.
> >>>>
> >>>> Still, I can see some benefits and even some simplifications.
> >>>>
> >>>> Regards,
> >>>> Simon  
> >>>  
> >
> >
> >
> > This discussion should have happened.
> > U-Boot boot sequence is crazily inefficient.
> >
> >
> >
> > When we talk about "relocation", two things are happening.
> >
> >  [1] U-Boot proper copies itself to the very end of DRAM
> >  [2] Fix-up the global symbols
> >
> > In my opinion, only [2] is useful.
> >
> >
> > SPL initializes the DRAM, so it knows the base and size of DRAM.
> > SPL should be able to load the U-Boot proper to the final
> > destination. So, [1] is unnecessary.
> >
> >
> > [2] is necessary because SPL may load the U-Boot proper
> > to a different place than CONFIG_SYS_TEXT_BASE.
> > This feature is useful for platforms
> > whose DRAM base/size is only known at run-time.
> > (Of course, it should be user-configurable by CONFIG_RELOCATE
> > or something.)
> >
> > Moreover, board_init_f() is unneeded -
> > everything in board_init_f() is already done by SPL.
> > Multiple-time DM initialization is really inefficient and ugly.
> >
> >
> > The following is how the ideal boot loader would work.
> >
> >
> > Requirement for U-Boot proper:
> > U-Boot never changes the location by itself.
> > So, SPL or a vendor loader must load U-Boot proper
> > to the final destination directly.
> > (You can load it to the very end of DRAM if you like,
> > but the actual place does not matter here.)
> >
> >
> > Boot sequence of U-Boot proper:
> > If CONFIG_RELOCATE (or something) is enabled,
> > it fixes the global symbols at the very beginning
> > of the boot.
> > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
> >
> > That's it.  Proceed to the rest of init code.
> > (= board_init_r)
> > board_init_f() is unnecessary.
> >
> > This should work for recent platforms.  
> 
> Yes that sounds reasonable to me.
> 
> We could do the symbol fixup/relocation in SPL after loading U-Boot.,
> although that would probably push us to using ELF format for U-Boot
> which is a bit limited.
> 
> Still I think the biggest performance improvement comes from turning
> on the cache in SPL. So the above is a simplification, not really a
> speed-up.
> 
> >
> >
> >
> > We should think about old platforms that boot from a NOR flash or
> > something. There are two solutions:
> >  - execute-in-place: run the code in the flash directly
> >  - use SPL (common/spl/spl-nor.c) if you want to run
> >    it from RAM  
> 
> This seems like a big regression in functionality. For example for x86
> 32-bit we currently don't have an SPL (we do for 64-bit). So I think
> this means that everything would be forced to have an SPL?
> 
> I am wondering who else we should cc on this discussion?

Not all boards use SPL. There are some targets, which use FBL (SPL
counterpart) from vendor and only U-boot proper. Good example is Odroid
XU3.

And I also do agree - for the original post in this discussion we
should have the measurements of boot time improvement.

> 
> Regards,
> Simon
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20171128/a9d37d2f/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-28  9:53                 ` Lukasz Majewski
@ 2017-11-28 11:30                   ` Peter Robinson
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Robinson @ 2017-11-28 11:30 UTC (permalink / raw)
  To: u-boot

>> (Tom - any thoughts about a more expansive cc list on this?)
>>
>> Hi Masahiro,
>>
>> On 26 November 2017 at 07:16, Masahiro Yamada
>> <yamada.masahiro@socionext.com> wrote:
>> > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
>> >> Hi Philipp,
>> >>
>> >> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> >> <philipp.tomsich@theobroma-systems.com> wrote:
>> >>> Hi,
>> >>>
>> >>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>> >>>>
>> >>>> +Tom, Masahiro, Philipp
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>> >>>>> Dear Kever Yang,
>> >>>>>
>> >>>>> In message
>> >>>>> <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> I can understand this feature, we always do dram_init_banks()
>> >>>>>> first, then we relocate to 'known' area, then will be no risk
>> >>>>>> to access memory. I believe there must be some historical
>> >>>>>> reason for some kind of device, the relocate feature is a
>> >>>>>> wonderful idea for it.
>> >>>>>
>> >>>>> This is actuallyu not so much a feature needed to support some
>> >>>>> specific device (in this case much simpler approahces would be
>> >>>>> possible), but to support a whole set of features.
>> >>>>> Unfortunately these appear to get forgotten / ignored over time.
>> >>>>>
>> >>>>>>     many other SoCs should be similar.
>> >>>>>> - Without relocate we can save many step, some of our customer
>> >>>>>> really care much about the boot time duration.
>> >>>>>>     * no need to relocate everything
>> >>>>>>     * no need to copy all the code
>> >>>>>>     * no need init the driver more than once
>> >>>>>
>> >>>>> Please have a look at the README, section "Memory Management".
>> >>>>> The reloaction is not done to any _fixed_ address, but the
>> >>>>> address is actually computed at runtime, depending on a number
>> >>>>> features enabled (at least this is how it used to be -
>> >>>>> appearently little of this is tested on a regular base, so I
>> >>>>> would not be surprised if things are broken today).
>> >>>>>
>> >>>>> The basic idea was to reserve areas of memory at the top of RAM,
>> >>>>> that would not be initialized / modified by U-Boot and Linux,
>> >>>>> not even across a reset / warm boot.
>> >>>>>
>> >>>>> This was used for exaple for:
>> >>>>>
>> >>>>> - pRAM (Protected RAM) which could be used to store all kind of
>> >>>>> data (for example, using a pramfs [Protected and Persistent RAM
>> >>>>>  Filesystem]) that could be kept across reboots of the OS.
>> >>>>>
>> >>>>> - shared frame buffer / video memory. U-Boot and Linux would be
>> >>>>> able to initialize the video memory just once (in U-Boot) and
>> >>>>> then share it, maybe even across reboots.  especially, this
>> >>>>> would allow for a very early splash screen that gets passed
>> >>>>> (flicker free) to Linux until some Linux GUI takes over (much
>> >>>>> more difficult today).
>> >>>>>
>> >>>>> - shared log buffer: U-Boot and Linux used to use the same
>> >>>>> syslog buffer mechanism, so you could share it between U-Boot
>> >>>>> and Linux. this allows for example to
>> >>>>>  * read the Linux kernel panic messages after reset in U-Boot;
>> >>>>> this is very useful when you bring up a new system and Linux
>> >>>>> crashes before it can display the log buffer on the console
>> >>>>>  * pass U-Boot POST results on to Linux, so the application code
>> >>>>>    can read and process these
>> >>>>>  * process the system log of the previous run (especially after
>> >>>>> a panic) in Lunux after it rebootet.
>> >>>>>
>> >>>>> etc.
>> >>>>>
>> >>>>> There are a number of such features which require to reserve
>> >>>>> room at the top of RAM, the size of which is calculatedat
>> >>>>> runtime, often depending on user settable environment data.
>> >>>>>
>> >>>>> All this cannot be done without relocation to a (dynmaically
>> >>>>> computed) target address.
>> >>>>>
>> >>>>>
>> >>>>> Yes, the code could be simpler and faster without that - but
>> >>>>> then, you cut off a number of features.
>> >>>>
>> >>>> I would be interested in seeing benchmarks showing the cost of
>> >>>> relocation in terms of boot time. Last time I did this was on
>> >>>> Exynos 5 and it was some years ago. The time was pretty small
>> >>>> provided the cache was on for the memory copies associated with
>> >>>> relocation itself. Something like 10-20ms but I don't have the
>> >>>> numbers handy.
>> >>>>
>> >>>> I think it is useful to be able to allocate memory in
>> >>>> board_init_f() for use by U-Boot for things like the display and
>> >>>> the malloc() region.
>> >>>>
>> >>>> Options we might consider:
>> >>>>
>> >>>> 1. Don't relocate the code and data. Thus we could avoid the
>> >>>> copy and relocation cost. This is already supported with the
>> >>>> GD_FLG_SKIP_RELOC used when U-Boot runs as an EFI app
>> >>>>
>> >>>> 2. Rather than throwing away the old malloc() region, keep it
>> >>>> around so existing allocated blocks work. Then new malloc()
>> >>>> region would be used for future allocations. We could perhaps
>> >>>> ignore free() calls in that region
>> >>>>
>> >>>> 2a. This would allow us to avoid re-init of driver model in most
>> >>>> cases I think. E.g. we could init serial and timer before
>> >>>> relocation and leave them inited after relocation. We could just
>> >>>> init the 'additional' devices not done before relocation.
>> >>>>
>> >>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>> >>>> suspect it would just be a pain though, since SPL might use
>> >>>> memory that U-Boot wants.
>> >>>>
>> >>>> 3. We could turn on the cache earlier. This removes most of the
>> >>>> boot-time penalty. Ideally this should be turned on in SPL and
>> >>>> perhaps redone in U-Boot which has more memory available. If SPL
>> >>>> is not used, we could turn on the cache before relocation.
>> >>>
>> >>> Both turning on the cache and initialising the clocking could be
>> >>> of benefit to boot-time.
>> >>>
>> >>> However, the biggest possible gain will come from utilising
>> >>> Falcon mode to skip the full U-Boot stage and directly boot into
>> >>> the OS from SPL.  This assumes that the drivers involved are
>> >>> fully optimised, so loading up the OS image does not take longer
>> >>> than necessary.
>> >>
>> >> I'd like to see numbers on that. From my experience, loading and
>> >> running U-Boot does not take very long...
>> >>
>> >>>
>> >>>> 4. Rather than the reserving memory in board_init_f() we could
>> >>>> have it call malloc() from the expanded region. We could then
>> >>>> perhaps then move this reserve/allocate code in to particular
>> >>>> drivers or subsystems, and drop a good chunk of the init
>> >>>> sequence. We would need to have a larger malloc() region than is
>> >>>> currently the case.
>> >>>>
>> >>>> There are still some arch-specific bits in board_init_f() which
>> >>>> make these sorts of changes a bit tricky to support generically.
>> >>>> IMO it would be best to move to 'generic relocation' written in
>> >>>> C, where all archs work basically the same way, before
>> >>>> attempting any of the above.
>> >>>>
>> >>>> Still, I can see some benefits and even some simplifications.
>> >>>>
>> >>>> Regards,
>> >>>> Simon
>> >>>
>> >
>> >
>> >
>> > This discussion should have happened.
>> > U-Boot boot sequence is crazily inefficient.
>> >
>> >
>> >
>> > When we talk about "relocation", two things are happening.
>> >
>> >  [1] U-Boot proper copies itself to the very end of DRAM
>> >  [2] Fix-up the global symbols
>> >
>> > In my opinion, only [2] is useful.
>> >
>> >
>> > SPL initializes the DRAM, so it knows the base and size of DRAM.
>> > SPL should be able to load the U-Boot proper to the final
>> > destination. So, [1] is unnecessary.
>> >
>> >
>> > [2] is necessary because SPL may load the U-Boot proper
>> > to a different place than CONFIG_SYS_TEXT_BASE.
>> > This feature is useful for platforms
>> > whose DRAM base/size is only known at run-time.
>> > (Of course, it should be user-configurable by CONFIG_RELOCATE
>> > or something.)
>> >
>> > Moreover, board_init_f() is unneeded -
>> > everything in board_init_f() is already done by SPL.
>> > Multiple-time DM initialization is really inefficient and ugly.
>> >
>> >
>> > The following is how the ideal boot loader would work.
>> >
>> >
>> > Requirement for U-Boot proper:
>> > U-Boot never changes the location by itself.
>> > So, SPL or a vendor loader must load U-Boot proper
>> > to the final destination directly.
>> > (You can load it to the very end of DRAM if you like,
>> > but the actual place does not matter here.)
>> >
>> >
>> > Boot sequence of U-Boot proper:
>> > If CONFIG_RELOCATE (or something) is enabled,
>> > it fixes the global symbols at the very beginning
>> > of the boot.
>> > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>> >
>> > That's it.  Proceed to the rest of init code.
>> > (= board_init_r)
>> > board_init_f() is unnecessary.
>> >
>> > This should work for recent platforms.
>>
>> Yes that sounds reasonable to me.
>>
>> We could do the symbol fixup/relocation in SPL after loading U-Boot.,
>> although that would probably push us to using ELF format for U-Boot
>> which is a bit limited.
>>
>> Still I think the biggest performance improvement comes from turning
>> on the cache in SPL. So the above is a simplification, not really a
>> speed-up.
>>
>> >
>> >
>> >
>> > We should think about old platforms that boot from a NOR flash or
>> > something. There are two solutions:
>> >  - execute-in-place: run the code in the flash directly
>> >  - use SPL (common/spl/spl-nor.c) if you want to run
>> >    it from RAM
>>
>> This seems like a big regression in functionality. For example for x86
>> 32-bit we currently don't have an SPL (we do for 64-bit). So I think
>> this means that everything would be forced to have an SPL?
>>
>> I am wondering who else we should cc on this discussion?
>
> Not all boards use SPL. There are some targets, which use FBL (SPL
> counterpart) from vendor and only U-boot proper. Good example is Odroid
> XU3.

Some aarch64 boards like Jetson TX series and Dragonboard chain load
u-boot from some other loader, things like qemu support I don't
believe use SPL either.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-27 13:21               ` Wolfgang Denk
@ 2017-11-29 10:10                 ` Masahiro Yamada
  0 siblings, 0 replies; 26+ messages in thread
From: Masahiro Yamada @ 2017-11-29 10:10 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

2017-11-27 22:21 GMT+09:00 Wolfgang Denk <wd@denx.de>:
> Dear Masahiro,
>
> In message <CAK7LNARc-MUPRmHSiSjtea9X6hpRAe9t9wgdbdfg2DYSEpC4RA@mail.gmail.com> you wrote:
>>
>> When we talk about "relocation", two things are happening.
>>
>>  [1] U-Boot proper copies itself to the very end of DRAM
>
> ...to the very end of DRAM minus space reserved for any memory
> regions we want to reserver for / share with the Linux kernel
> (video memory, log buffer, protected RAM, ...)
>
>>  [2] Fix-up the global symbols
>>
>> In my opinion, only [2] is useful.
>
> This is your opinion, accepted.
> I do not agree with this, i. e. I have a different opinion.
>
>> SPL initializes the DRAM, so it knows the base and size of DRAM.
>
> But it does not know the relocation address yet.  As this is
> dynamically computed, depending on environment variable settings,
> moving this calculation into the SPL means the SPL must be capable to
> read the environment.  this pulls in a ton of code, and any
> advantages you may have for falcon mode are damaged.


How precise do we need to compute this?
Platforms may choose a reasonable fixed size for reserved memory.

The end of memory minus 256KB, 512KB, or whatever as you like.



>> The following is how the ideal boot loader would work.
>>
>>
>> Requirement for U-Boot proper:
>> U-Boot never changes the location by itself.
>
> This means you kill all the features that depend on this?


Are there features that depend on two-time relocation?

As I stated above, the calculation can be compromised to some extent.
We may waste a small amount of memory by the rough calculation,
but is it a big deal?


> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
> f u cn rd ths, itn tyg h myxbl cd.
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-27 17:13               ` Simon Glass
  2017-11-27 18:53                 ` Tom Rini
  2017-11-28  9:53                 ` Lukasz Majewski
@ 2017-11-29 10:11                 ` Masahiro Yamada
  2017-11-29 10:48                   ` Joakim Tjernlund
  2 siblings, 1 reply; 26+ messages in thread
From: Masahiro Yamada @ 2017-11-29 10:11 UTC (permalink / raw)
  To: u-boot

Hi Simon,


2017-11-28 2:13 GMT+09:00 Simon Glass <sjg@chromium.org>:
> (Tom - any thoughts about a more expansive cc list on this?)
>
> Hi Masahiro,
>
> On 26 November 2017 at 07:16, Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
>> 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
>>> Hi Philipp,
>>>
>>> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>>> <philipp.tomsich@theobroma-systems.com> wrote:
>>>> Hi,
>>>>
>>>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>>>>>
>>>>> +Tom, Masahiro, Philipp
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>>>>>> Dear Kever Yang,
>>>>>>
>>>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>>>>>>>
>>>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>>>> I believe there must be some historical reason for some kind of device,
>>>>>>> the relocate feature is a wonderful idea for it.
>>>>>>
>>>>>> This is actuallyu not so much a feature needed to support some
>>>>>> specific device (in this case much simpler approahces would be
>>>>>> possible), but to support a whole set of features.  Unfortunately
>>>>>> these appear to get forgotten / ignored over time.
>>>>>>
>>>>>>>     many other SoCs should be similar.
>>>>>>> - Without relocate we can save many step, some of our customer really
>>>>>>>     care much about the boot time duration.
>>>>>>>     * no need to relocate everything
>>>>>>>     * no need to copy all the code
>>>>>>>     * no need init the driver more than once
>>>>>>
>>>>>> Please have a look at the README, section "Memory Management".
>>>>>> The reloaction is not done to any _fixed_ address, but the address
>>>>>> is actually computed at runtime, depending on a number features
>>>>>> enabled (at least this is how it used to be - appearently little of
>>>>>> this is tested on a regular base, so I would not be surprised if
>>>>>> things are broken today).
>>>>>>
>>>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>>>> even across a reset / warm boot.
>>>>>>
>>>>>> This was used for exaple for:
>>>>>>
>>>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>>>>  (for example, using a pramfs [Protected and Persistent RAM
>>>>>>  Filesystem]) that could be kept across reboots of the OS.
>>>>>>
>>>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>>>>  to initialize the video memory just once (in U-Boot) and then
>>>>>>  share it, maybe even across reboots.  especially, this would allow
>>>>>>  for a very early splash screen that gets passed (flicker free) to
>>>>>>  Linux until some Linux GUI takes over (much more difficult today).
>>>>>>
>>>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>>>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>>>>>  this allows for example to
>>>>>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>>>>>    is very useful when you bring up a new system and Linux crashes
>>>>>>    before it can display the log buffer on the console
>>>>>>  * pass U-Boot POST results on to Linux, so the application code
>>>>>>    can read and process these
>>>>>>  * process the system log of the previous run (especially after a
>>>>>>    panic) in Lunux after it rebootet.
>>>>>>
>>>>>> etc.
>>>>>>
>>>>>> There are a number of such features which require to reserve room at
>>>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>>>> depending on user settable environment data.
>>>>>>
>>>>>> All this cannot be done without relocation to a (dynmaically
>>>>>> computed) target address.
>>>>>>
>>>>>>
>>>>>> Yes, the code could be simpler and faster without that - but then,
>>>>>> you cut off a number of features.
>>>>>
>>>>> I would be interested in seeing benchmarks showing the cost of
>>>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>>>> and it was some years ago. The time was pretty small provided the
>>>>> cache was on for the memory copies associated with relocation itself.
>>>>> Something like 10-20ms but I don't have the numbers handy.
>>>>>
>>>>> I think it is useful to be able to allocate memory in board_init_f()
>>>>> for use by U-Boot for things like the display and the malloc() region.
>>>>>
>>>>> Options we might consider:
>>>>>
>>>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>>>> used when U-Boot runs as an EFI app
>>>>>
>>>>> 2. Rather than throwing away the old malloc() region, keep it around
>>>>> so existing allocated blocks work. Then new malloc() region would be
>>>>> used for future allocations. We could perhaps ignore free() calls in
>>>>> that region
>>>>>
>>>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>>>> I think. E.g. we could init serial and timer before relocation and
>>>>> leave them inited after relocation. We could just init the
>>>>> 'additional' devices not done before relocation.
>>>>>
>>>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>>>> suspect it would just be a pain though, since SPL might use memory
>>>>> that U-Boot wants.
>>>>>
>>>>> 3. We could turn on the cache earlier. This removes most of the
>>>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>>>> redone in U-Boot which has more memory available. If SPL is not used,
>>>>> we could turn on the cache before relocation.
>>>>
>>>> Both turning on the cache and initialising the clocking could be of benefit
>>>> to boot-time.
>>>>
>>>> However, the biggest possible gain will come from utilising Falcon mode
>>>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>>>> assumes that the drivers involved are fully optimised, so loading up the
>>>> OS image does not take longer than necessary.
>>>
>>> I'd like to see numbers on that. From my experience, loading and
>>> running U-Boot does not take very long...
>>>
>>>>
>>>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>>>> call malloc() from the expanded region. We could then perhaps then
>>>>> move this reserve/allocate code in to particular drivers or
>>>>> subsystems, and drop a good chunk of the init sequence. We would need
>>>>> to have a larger malloc() region than is currently the case.
>>>>>
>>>>> There are still some arch-specific bits in board_init_f() which make
>>>>> these sorts of changes a bit tricky to support generically. IMO it
>>>>> would be best to move to 'generic relocation' written in C, where all
>>>>> archs work basically the same way, before attempting any of the above.
>>>>>
>>>>> Still, I can see some benefits and even some simplifications.
>>>>>
>>>>> Regards,
>>>>> Simon
>>>>
>>
>>
>>
>> This discussion should have happened.
>> U-Boot boot sequence is crazily inefficient.
>>
>>
>>
>> When we talk about "relocation", two things are happening.
>>
>>  [1] U-Boot proper copies itself to the very end of DRAM
>>  [2] Fix-up the global symbols
>>
>> In my opinion, only [2] is useful.
>>
>>
>> SPL initializes the DRAM, so it knows the base and size of DRAM.
>> SPL should be able to load the U-Boot proper to the final destination.
>> So, [1] is unnecessary.
>>
>>
>> [2] is necessary because SPL may load the U-Boot proper
>> to a different place than CONFIG_SYS_TEXT_BASE.
>> This feature is useful for platforms
>> whose DRAM base/size is only known at run-time.
>> (Of course, it should be user-configurable by CONFIG_RELOCATE
>> or something.)
>>
>> Moreover, board_init_f() is unneeded -
>> everything in board_init_f() is already done by SPL.
>> Multiple-time DM initialization is really inefficient and ugly.
>>
>>
>> The following is how the ideal boot loader would work.
>>
>>
>> Requirement for U-Boot proper:
>> U-Boot never changes the location by itself.
>> So, SPL or a vendor loader must load U-Boot proper
>> to the final destination directly.
>> (You can load it to the very end of DRAM if you like,
>> but the actual place does not matter here.)
>>
>>
>> Boot sequence of U-Boot proper:
>> If CONFIG_RELOCATE (or something) is enabled,
>> it fixes the global symbols at the very beginning
>> of the boot.
>> (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>>
>> That's it.  Proceed to the rest of init code.
>> (= board_init_r)
>> board_init_f() is unnecessary.
>>
>> This should work for recent platforms.
>
> Yes that sounds reasonable to me.
>
> We could do the symbol fixup/relocation in SPL after loading U-Boot.,
> although that would probably push us to using ELF format for U-Boot
> which is a bit limited.
>
> Still I think the biggest performance improvement comes from turning
> on the cache in SPL. So the above is a simplification, not really a
> speed-up.


Right.
I am more interested in simplification than in speed-up.
The boot speed is not a significant problem at least for my boards.


>>
>>
>>
>> We should think about old platforms that boot from a NOR flash or something.
>> There are two solutions:
>>  - execute-in-place: run the code in the flash directly
>>  - use SPL (common/spl/spl-nor.c) if you want to run
>>    it from RAM
>
> This seems like a big regression in functionality. For example for x86
> 32-bit we currently don't have an SPL (we do for 64-bit). So I think
> this means that everything would be forced to have an SPL?

After grace period for migration, Yes.
XIP or SPL.
No relocation in U-Boot proper.

This assumption will allow us to dump a lot of burden.

Remove relocation
Remove board_init_f()
Remove pre-reloc DM init
Perhaps, remove struct global_data
etc.


> I am wondering who else we should cc on this discussion?
>
> Regards,
> Simon
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-29 10:11                 ` Masahiro Yamada
@ 2017-11-29 10:48                   ` Joakim Tjernlund
  2017-12-02  3:29                     ` Simon Glass
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2017-11-29 10:48 UTC (permalink / raw)
  To: u-boot

On Wed, 2017-11-29 at 19:11 +0900, Masahiro Yamada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Hi Simon,
> 
> 
> 2017-11-28 2:13 GMT+09:00 Simon Glass <sjg@chromium.org>:
> > (Tom - any thoughts about a more expansive cc list on this?)
> > 
> > Hi Masahiro,
> > 
> > On 26 November 2017 at 07:16, Masahiro Yamada
> > <yamada.masahiro@socionext.com> wrote:
> > > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
> > > > Hi Philipp,
> > > > 
> > > > On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> > > > <philipp.tomsich@theobroma-systems.com> wrote:
> > > > > Hi,
> > > > > 
> > > > > > On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
> > > > > > 
> > > > > > +Tom, Masahiro, Philipp
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
> > > > > > > Dear Kever Yang,
> > > > > > > 
> > > > > > > In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
> > > > > > > > 
> > > > > > > > I can understand this feature, we always do dram_init_banks() first,
> > > > > > > > then we relocate to 'known' area, then will be no risk to access memory.
> > > > > > > > I believe there must be some historical reason for some kind of device,
> > > > > > > > the relocate feature is a wonderful idea for it.
> > > > > > > 
> > > > > > > This is actuallyu not so much a feature needed to support some
> > > > > > > specific device (in this case much simpler approahces would be
> > > > > > > possible), but to support a whole set of features.  Unfortunately
> > > > > > > these appear to get forgotten / ignored over time.
> > > > > > > 
> > > > > > > >     many other SoCs should be similar.
> > > > > > > > - Without relocate we can save many step, some of our customer really
> > > > > > > >     care much about the boot time duration.
> > > > > > > >     * no need to relocate everything
> > > > > > > >     * no need to copy all the code
> > > > > > > >     * no need init the driver more than once
> > > > > > > 
> > > > > > > Please have a look at the README, section "Memory Management".
> > > > > > > The reloaction is not done to any _fixed_ address, but the address
> > > > > > > is actually computed at runtime, depending on a number features
> > > > > > > enabled (at least this is how it used to be - appearently little of
> > > > > > > this is tested on a regular base, so I would not be surprised if
> > > > > > > things are broken today).
> > > > > > > 
> > > > > > > The basic idea was to reserve areas of memory at the top of RAM,
> > > > > > > that would not be initialized / modified by U-Boot and Linux, not
> > > > > > > even across a reset / warm boot.
> > > > > > > 
> > > > > > > This was used for exaple for:
> > > > > > > 
> > > > > > > - pRAM (Protected RAM) which could be used to store all kind of data
> > > > > > >  (for example, using a pramfs [Protected and Persistent RAM
> > > > > > >  Filesystem]) that could be kept across reboots of the OS.
> > > > > > > 
> > > > > > > - shared frame buffer / video memory. U-Boot and Linux would be able
> > > > > > >  to initialize the video memory just once (in U-Boot) and then
> > > > > > >  share it, maybe even across reboots.  especially, this would allow
> > > > > > >  for a very early splash screen that gets passed (flicker free) to
> > > > > > >  Linux until some Linux GUI takes over (much more difficult today).
> > > > > > > 
> > > > > > > - shared log buffer: U-Boot and Linux used to use the same syslog
> > > > > > >  buffer mechanism, so you could share it between U-Boot and Linux.
> > > > > > >  this allows for example to
> > > > > > >  * read the Linux kernel panic messages after reset in U-Boot; this
> > > > > > >    is very useful when you bring up a new system and Linux crashes
> > > > > > >    before it can display the log buffer on the console
> > > > > > >  * pass U-Boot POST results on to Linux, so the application code
> > > > > > >    can read and process these
> > > > > > >  * process the system log of the previous run (especially after a
> > > > > > >    panic) in Lunux after it rebootet.
> > > > > > > 
> > > > > > > etc.
> > > > > > > 
> > > > > > > There are a number of such features which require to reserve room at
> > > > > > > the top of RAM, the size of which is calculatedat runtime, often
> > > > > > > depending on user settable environment data.
> > > > > > > 
> > > > > > > All this cannot be done without relocation to a (dynmaically
> > > > > > > computed) target address.
> > > > > > > 
> > > > > > > 
> > > > > > > Yes, the code could be simpler and faster without that - but then,
> > > > > > > you cut off a number of features.
> > > > > > 
> > > > > > I would be interested in seeing benchmarks showing the cost of
> > > > > > relocation in terms of boot time. Last time I did this was on Exynos 5
> > > > > > and it was some years ago. The time was pretty small provided the
> > > > > > cache was on for the memory copies associated with relocation itself.
> > > > > > Something like 10-20ms but I don't have the numbers handy.
> > > > > > 
> > > > > > I think it is useful to be able to allocate memory in board_init_f()
> > > > > > for use by U-Boot for things like the display and the malloc() region.
> > > > > > 
> > > > > > Options we might consider:
> > > > > > 
> > > > > > 1. Don't relocate the code and data. Thus we could avoid the copy and
> > > > > > relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
> > > > > > used when U-Boot runs as an EFI app
> > > > > > 
> > > > > > 2. Rather than throwing away the old malloc() region, keep it around
> > > > > > so existing allocated blocks work. Then new malloc() region would be
> > > > > > used for future allocations. We could perhaps ignore free() calls in
> > > > > > that region
> > > > > > 
> > > > > > 2a. This would allow us to avoid re-init of driver model in most cases
> > > > > > I think. E.g. we could init serial and timer before relocation and
> > > > > > leave them inited after relocation. We could just init the
> > > > > > 'additional' devices not done before relocation.
> > > > > > 
> > > > > > 2b. I suppose we could even extend this to SPL if we wanted to. I
> > > > > > suspect it would just be a pain though, since SPL might use memory
> > > > > > that U-Boot wants.
> > > > > > 
> > > > > > 3. We could turn on the cache earlier. This removes most of the
> > > > > > boot-time penalty. Ideally this should be turned on in SPL and perhaps
> > > > > > redone in U-Boot which has more memory available. If SPL is not used,
> > > > > > we could turn on the cache before relocation.
> > > > > 
> > > > > Both turning on the cache and initialising the clocking could be of benefit
> > > > > to boot-time.
> > > > > 
> > > > > However, the biggest possible gain will come from utilising Falcon mode
> > > > > to skip the full U-Boot stage and directly boot into the OS from SPL.  This
> > > > > assumes that the drivers involved are fully optimised, so loading up the
> > > > > OS image does not take longer than necessary.
> > > > 
> > > > I'd like to see numbers on that. From my experience, loading and
> > > > running U-Boot does not take very long...
> > > > 
> > > > > 
> > > > > > 4. Rather than the reserving memory in board_init_f() we could have it
> > > > > > call malloc() from the expanded region. We could then perhaps then
> > > > > > move this reserve/allocate code in to particular drivers or
> > > > > > subsystems, and drop a good chunk of the init sequence. We would need
> > > > > > to have a larger malloc() region than is currently the case.
> > > > > > 
> > > > > > There are still some arch-specific bits in board_init_f() which make
> > > > > > these sorts of changes a bit tricky to support generically. IMO it
> > > > > > would be best to move to 'generic relocation' written in C, where all
> > > > > > archs work basically the same way, before attempting any of the above.
> > > > > > 
> > > > > > Still, I can see some benefits and even some simplifications.
> > > > > > 
> > > > > > Regards,
> > > > > > Simon
> > > 
> > > 
> > > 
> > > This discussion should have happened.
> > > U-Boot boot sequence is crazily inefficient.
> > > 
> > > 
> > > 
> > > When we talk about "relocation", two things are happening.
> > > 
> > >  [1] U-Boot proper copies itself to the very end of DRAM
> > >  [2] Fix-up the global symbols
> > > 
> > > In my opinion, only [2] is useful.
> > > 
> > > 
> > > SPL initializes the DRAM, so it knows the base and size of DRAM.
> > > SPL should be able to load the U-Boot proper to the final destination.
> > > So, [1] is unnecessary.
> > > 
> > > 
> > > [2] is necessary because SPL may load the U-Boot proper
> > > to a different place than CONFIG_SYS_TEXT_BASE.
> > > This feature is useful for platforms
> > > whose DRAM base/size is only known at run-time.
> > > (Of course, it should be user-configurable by CONFIG_RELOCATE
> > > or something.)
> > > 
> > > Moreover, board_init_f() is unneeded -
> > > everything in board_init_f() is already done by SPL.
> > > Multiple-time DM initialization is really inefficient and ugly.
> > > 
> > > 
> > > The following is how the ideal boot loader would work.
> > > 
> > > 
> > > Requirement for U-Boot proper:
> > > U-Boot never changes the location by itself.
> > > So, SPL or a vendor loader must load U-Boot proper
> > > to the final destination directly.
> > > (You can load it to the very end of DRAM if you like,
> > > but the actual place does not matter here.)
> > > 
> > > 
> > > Boot sequence of U-Boot proper:
> > > If CONFIG_RELOCATE (or something) is enabled,
> > > it fixes the global symbols at the very beginning
> > > of the boot.
> > > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
> > > 
> > > That's it.  Proceed to the rest of init code.
> > > (= board_init_r)
> > > board_init_f() is unnecessary.
> > > 
> > > This should work for recent platforms.
> > 
> > Yes that sounds reasonable to me.
> > 
> > We could do the symbol fixup/relocation in SPL after loading U-Boot.,
> > although that would probably push us to using ELF format for U-Boot
> > which is a bit limited.
> > 
> > Still I think the biggest performance improvement comes from turning
> > on the cache in SPL. So the above is a simplification, not really a
> > speed-up.
> 
> 
> Right.
> I am more interested in simplification than in speed-up.
> The boot speed is not a significant problem at least for my boards.
> 
> 
> > > 
> > > 
> > > 
> > > We should think about old platforms that boot from a NOR flash or something.
> > > There are two solutions:
> > >  - execute-in-place: run the code in the flash directly
> > >  - use SPL (common/spl/spl-nor.c) if you want to run
> > >    it from RAM
> > 
> > This seems like a big regression in functionality. For example for x86
> > 32-bit we currently don't have an SPL (we do for 64-bit). So I think
> > this means that everything would be forced to have an SPL?
> 
> After grace period for migration, Yes.
> XIP or SPL.
> No relocation in U-Boot proper.
> 
> This assumption will allow us to dump a lot of burden.
> 
> Remove relocation
> Remove board_init_f()
> Remove pre-reloc DM init
> Perhaps, remove struct global_data
> etc.

I have not managed to keep up with this discussion but it seems you are suggesting
some radical change for NOR based boot boards ?

We use such boards(ppc) and also use pram etc. would these still
work? 

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
  2017-11-29 10:48                   ` Joakim Tjernlund
@ 2017-12-02  3:29                     ` Simon Glass
  0 siblings, 0 replies; 26+ messages in thread
From: Simon Glass @ 2017-12-02  3:29 UTC (permalink / raw)
  To: u-boot

Hi Joakim,

On 29 November 2017 at 03:48, Joakim Tjernlund
<Joakim.Tjernlund@infinera.com> wrote:
> On Wed, 2017-11-29 at 19:11 +0900, Masahiro Yamada wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>>
>> Hi Simon,
>>
>>
>> 2017-11-28 2:13 GMT+09:00 Simon Glass <sjg@chromium.org>:
>> > (Tom - any thoughts about a more expansive cc list on this?)
>> >
>> > Hi Masahiro,
>> >
>> > On 26 November 2017 at 07:16, Masahiro Yamada
>> > <yamada.masahiro@socionext.com> wrote:
>> > > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg@chromium.org>:
>> > > > Hi Philipp,
>> > > >
>> > > > On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> > > > <philipp.tomsich@theobroma-systems.com> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > > On 25 Nov 2017, at 23:34, Simon Glass <sjg@chromium.org> wrote:
>> > > > > >
>> > > > > > +Tom, Masahiro, Philipp
>> > > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > On 22 November 2017 at 03:27, Wolfgang Denk <wd@denx.de> wrote:
>> > > > > > > Dear Kever Yang,
>> > > > > > >
>> > > > > > > In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8@rock-chips.com> you wrote:
>> > > > > > > >
>> > > > > > > > I can understand this feature, we always do dram_init_banks() first,
>> > > > > > > > then we relocate to 'known' area, then will be no risk to access memory.
>> > > > > > > > I believe there must be some historical reason for some kind of device,
>> > > > > > > > the relocate feature is a wonderful idea for it.
>> > > > > > >
>> > > > > > > This is actuallyu not so much a feature needed to support some
>> > > > > > > specific device (in this case much simpler approahces would be
>> > > > > > > possible), but to support a whole set of features.  Unfortunately
>> > > > > > > these appear to get forgotten / ignored over time.
>> > > > > > >
>> > > > > > > >     many other SoCs should be similar.
>> > > > > > > > - Without relocate we can save many step, some of our customer really
>> > > > > > > >     care much about the boot time duration.
>> > > > > > > >     * no need to relocate everything
>> > > > > > > >     * no need to copy all the code
>> > > > > > > >     * no need init the driver more than once
>> > > > > > >
>> > > > > > > Please have a look at the README, section "Memory Management".
>> > > > > > > The reloaction is not done to any _fixed_ address, but the address
>> > > > > > > is actually computed at runtime, depending on a number features
>> > > > > > > enabled (at least this is how it used to be - appearently little of
>> > > > > > > this is tested on a regular base, so I would not be surprised if
>> > > > > > > things are broken today).
>> > > > > > >
>> > > > > > > The basic idea was to reserve areas of memory at the top of RAM,
>> > > > > > > that would not be initialized / modified by U-Boot and Linux, not
>> > > > > > > even across a reset / warm boot.
>> > > > > > >
>> > > > > > > This was used for exaple for:
>> > > > > > >
>> > > > > > > - pRAM (Protected RAM) which could be used to store all kind of data
>> > > > > > >  (for example, using a pramfs [Protected and Persistent RAM
>> > > > > > >  Filesystem]) that could be kept across reboots of the OS.
>> > > > > > >
>> > > > > > > - shared frame buffer / video memory. U-Boot and Linux would be able
>> > > > > > >  to initialize the video memory just once (in U-Boot) and then
>> > > > > > >  share it, maybe even across reboots.  especially, this would allow
>> > > > > > >  for a very early splash screen that gets passed (flicker free) to
>> > > > > > >  Linux until some Linux GUI takes over (much more difficult today).
>> > > > > > >
>> > > > > > > - shared log buffer: U-Boot and Linux used to use the same syslog
>> > > > > > >  buffer mechanism, so you could share it between U-Boot and Linux.
>> > > > > > >  this allows for example to
>> > > > > > >  * read the Linux kernel panic messages after reset in U-Boot; this
>> > > > > > >    is very useful when you bring up a new system and Linux crashes
>> > > > > > >    before it can display the log buffer on the console
>> > > > > > >  * pass U-Boot POST results on to Linux, so the application code
>> > > > > > >    can read and process these
>> > > > > > >  * process the system log of the previous run (especially after a
>> > > > > > >    panic) in Lunux after it rebootet.
>> > > > > > >
>> > > > > > > etc.
>> > > > > > >
>> > > > > > > There are a number of such features which require to reserve room at
>> > > > > > > the top of RAM, the size of which is calculatedat runtime, often
>> > > > > > > depending on user settable environment data.
>> > > > > > >
>> > > > > > > All this cannot be done without relocation to a (dynmaically
>> > > > > > > computed) target address.
>> > > > > > >
>> > > > > > >
>> > > > > > > Yes, the code could be simpler and faster without that - but then,
>> > > > > > > you cut off a number of features.
>> > > > > >
>> > > > > > I would be interested in seeing benchmarks showing the cost of
>> > > > > > relocation in terms of boot time. Last time I did this was on Exynos 5
>> > > > > > and it was some years ago. The time was pretty small provided the
>> > > > > > cache was on for the memory copies associated with relocation itself.
>> > > > > > Something like 10-20ms but I don't have the numbers handy.
>> > > > > >
>> > > > > > I think it is useful to be able to allocate memory in board_init_f()
>> > > > > > for use by U-Boot for things like the display and the malloc() region.
>> > > > > >
>> > > > > > Options we might consider:
>> > > > > >
>> > > > > > 1. Don't relocate the code and data. Thus we could avoid the copy and
>> > > > > > relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>> > > > > > used when U-Boot runs as an EFI app
>> > > > > >
>> > > > > > 2. Rather than throwing away the old malloc() region, keep it around
>> > > > > > so existing allocated blocks work. Then new malloc() region would be
>> > > > > > used for future allocations. We could perhaps ignore free() calls in
>> > > > > > that region
>> > > > > >
>> > > > > > 2a. This would allow us to avoid re-init of driver model in most cases
>> > > > > > I think. E.g. we could init serial and timer before relocation and
>> > > > > > leave them inited after relocation. We could just init the
>> > > > > > 'additional' devices not done before relocation.
>> > > > > >
>> > > > > > 2b. I suppose we could even extend this to SPL if we wanted to. I
>> > > > > > suspect it would just be a pain though, since SPL might use memory
>> > > > > > that U-Boot wants.
>> > > > > >
>> > > > > > 3. We could turn on the cache earlier. This removes most of the
>> > > > > > boot-time penalty. Ideally this should be turned on in SPL and perhaps
>> > > > > > redone in U-Boot which has more memory available. If SPL is not used,
>> > > > > > we could turn on the cache before relocation.
>> > > > >
>> > > > > Both turning on the cache and initialising the clocking could be of benefit
>> > > > > to boot-time.
>> > > > >
>> > > > > However, the biggest possible gain will come from utilising Falcon mode
>> > > > > to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>> > > > > assumes that the drivers involved are fully optimised, so loading up the
>> > > > > OS image does not take longer than necessary.
>> > > >
>> > > > I'd like to see numbers on that. From my experience, loading and
>> > > > running U-Boot does not take very long...
>> > > >
>> > > > >
>> > > > > > 4. Rather than the reserving memory in board_init_f() we could have it
>> > > > > > call malloc() from the expanded region. We could then perhaps then
>> > > > > > move this reserve/allocate code in to particular drivers or
>> > > > > > subsystems, and drop a good chunk of the init sequence. We would need
>> > > > > > to have a larger malloc() region than is currently the case.
>> > > > > >
>> > > > > > There are still some arch-specific bits in board_init_f() which make
>> > > > > > these sorts of changes a bit tricky to support generically. IMO it
>> > > > > > would be best to move to 'generic relocation' written in C, where all
>> > > > > > archs work basically the same way, before attempting any of the above.
>> > > > > >
>> > > > > > Still, I can see some benefits and even some simplifications.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Simon
>> > >
>> > >
>> > >
>> > > This discussion should have happened.
>> > > U-Boot boot sequence is crazily inefficient.
>> > >
>> > >
>> > >
>> > > When we talk about "relocation", two things are happening.
>> > >
>> > >  [1] U-Boot proper copies itself to the very end of DRAM
>> > >  [2] Fix-up the global symbols
>> > >
>> > > In my opinion, only [2] is useful.
>> > >
>> > >
>> > > SPL initializes the DRAM, so it knows the base and size of DRAM.
>> > > SPL should be able to load the U-Boot proper to the final destination.
>> > > So, [1] is unnecessary.
>> > >
>> > >
>> > > [2] is necessary because SPL may load the U-Boot proper
>> > > to a different place than CONFIG_SYS_TEXT_BASE.
>> > > This feature is useful for platforms
>> > > whose DRAM base/size is only known at run-time.
>> > > (Of course, it should be user-configurable by CONFIG_RELOCATE
>> > > or something.)
>> > >
>> > > Moreover, board_init_f() is unneeded -
>> > > everything in board_init_f() is already done by SPL.
>> > > Multiple-time DM initialization is really inefficient and ugly.
>> > >
>> > >
>> > > The following is how the ideal boot loader would work.
>> > >
>> > >
>> > > Requirement for U-Boot proper:
>> > > U-Boot never changes the location by itself.
>> > > So, SPL or a vendor loader must load U-Boot proper
>> > > to the final destination directly.
>> > > (You can load it to the very end of DRAM if you like,
>> > > but the actual place does not matter here.)
>> > >
>> > >
>> > > Boot sequence of U-Boot proper:
>> > > If CONFIG_RELOCATE (or something) is enabled,
>> > > it fixes the global symbols at the very beginning
>> > > of the boot.
>> > > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>> > >
>> > > That's it.  Proceed to the rest of init code.
>> > > (= board_init_r)
>> > > board_init_f() is unnecessary.
>> > >
>> > > This should work for recent platforms.
>> >
>> > Yes that sounds reasonable to me.
>> >
>> > We could do the symbol fixup/relocation in SPL after loading U-Boot.,
>> > although that would probably push us to using ELF format for U-Boot
>> > which is a bit limited.
>> >
>> > Still I think the biggest performance improvement comes from turning
>> > on the cache in SPL. So the above is a simplification, not really a
>> > speed-up.
>>
>>
>> Right.
>> I am more interested in simplification than in speed-up.
>> The boot speed is not a significant problem at least for my boards.
>>
>>
>> > >
>> > >
>> > >
>> > > We should think about old platforms that boot from a NOR flash or something.
>> > > There are two solutions:
>> > >  - execute-in-place: run the code in the flash directly
>> > >  - use SPL (common/spl/spl-nor.c) if you want to run
>> > >    it from RAM
>> >
>> > This seems like a big regression in functionality. For example for x86
>> > 32-bit we currently don't have an SPL (we do for 64-bit). So I think
>> > this means that everything would be forced to have an SPL?
>>
>> After grace period for migration, Yes.
>> XIP or SPL.
>> No relocation in U-Boot proper.
>>
>> This assumption will allow us to dump a lot of burden.
>>
>> Remove relocation
>> Remove board_init_f()
>> Remove pre-reloc DM init
>> Perhaps, remove struct global_data
>> etc.
>
> I have not managed to keep up with this discussion but it seems you are suggesting
> some radical change for NOR based boot boards ?
>
> We use such boards(ppc) and also use pram etc. would these still
> work?

I think they would have to switch to SPL.

I suppose another way is to adjust boards which DO use SPL to NOT use
board_init_f().

Regards,
Simon

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [U-Boot] U-Boot proper(not SPL) relocate option
@ 2017-11-21  9:38 Kever Yang
  0 siblings, 0 replies; 26+ messages in thread
From: Kever Yang @ 2017-11-21  9:38 UTC (permalink / raw)
  To: u-boot

Hi Guys,

     I try to understand why we need to do the relocate in U-Boot.
 From the document README/crt0.S, I think the relocation feature comes
from some SoC have limited SRAM whose size is enough to load the whole
U-Boot, but not enough to run all the drivers.

     I don't know how many SoCs/Archs still must use this feature, but 
I'm sure all
Rockchip SoCs do not need this feature in both SPL and proper U-Boot,
because rockchip using SPL always running in SRAM to init DDR SDRAM,
and after DRAM available always running U-Boot in DRAM.

There is a CONFIG_SPL_SKIP_RELOCATE for SPL to skip the relocate,
can we have another CONFIG_SKIP_RELOCATE for U-Boot proper?

If we enable relocate in SPL, we init serial driver 4 times not 
including debug uart.
- before and after relocate in SPL, before and after U-Boot.


Here is the document from README:

board_init_f():
         - purpose: set up the machine ready for running board_init_r():
                 i.e. SDRAM and serial UART
         - global_data is available
         - stack is in SRAM
         - BSS is not available, so you cannot use global/static variables,
                 only stack variables and global_data

         Non-SPL-specific notes:
         - dram_init() is called to set up DRAM. If already done in SPL 
this
                 can do nothing

         SPL-specific notes:
         - you can override the entire board_init_f() function with your own
                 version as needed.
         - preloader_console_init() can be called here in extremis
         - should set up SDRAM, and anything needed to make the UART work
         - these is no need to clear BSS, it will be done by crt0.S
         - must return normally from this function (don't call 
board_init_r()
                 directly)

board_init_r():
         - purpose: main execution, common code
         - global_data is available
         - SDRAM is available
         - BSS is available, all static/global variables can be used
         - execution eventually continues to main_loop()

         Non-SPL-specific notes:
         - U-Boot is relocated to the top of memory and is now running from
                 there.

         SPL-specific notes:
         - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is 
defined and
                 CONFIG_SPL_STACK_R_ADDR points into SDRAM
         - preloader_console_init() can be called here - typically this is
                 done by selecting CONFIG_SPL_BOARD_INIT and then 
supplying a
                 spl_board_init() function containing this call
         - loads U-Boot or (in falcon mode) Linux


Thanks,
- Kever

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-12-02  3:29 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21  9:33 [U-Boot] U-Boot proper(not SPL) relocate option Kever Yang
2017-11-21 10:29 ` Lukasz Majewski
2017-11-22  1:59   ` Kever Yang
2017-11-22  7:29     ` Chris Packham
2017-11-22  8:47       ` Lukasz Majewski
2017-11-22  8:45     ` Lokesh Vutla
2017-11-22  8:51     ` Lukasz Majewski
2017-11-22 10:27     ` Wolfgang Denk
2017-11-25 22:34       ` Simon Glass
2017-11-25 23:31         ` Dr. Philipp Tomsich
2017-11-26 11:38           ` Simon Glass
2017-11-26 13:44             ` Dr. Philipp Tomsich
2017-11-26 13:49               ` Dr. Philipp Tomsich
2017-11-26 14:16             ` Masahiro Yamada
2017-11-27 13:21               ` Wolfgang Denk
2017-11-29 10:10                 ` Masahiro Yamada
2017-11-27 17:13               ` Simon Glass
2017-11-27 18:53                 ` Tom Rini
2017-11-28  9:53                 ` Lukasz Majewski
2017-11-28 11:30                   ` Peter Robinson
2017-11-29 10:11                 ` Masahiro Yamada
2017-11-29 10:48                   ` Joakim Tjernlund
2017-12-02  3:29                     ` Simon Glass
2017-11-27 18:52               ` Tom Rini
2017-11-26 14:04 ` Andreas Färber
2017-11-21  9:38 Kever Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.