All of lore.kernel.org
 help / color / mirror / Atom feed
* Size growth?
@ 2020-10-16 19:36 Tom Rini
  2020-10-16 21:46 ` Simon Glass
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-16 19:36 UTC (permalink / raw)
  To: devicetree-compiler-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 6071 bytes --]

Hey all,

With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
to cbca977ea121.  I'm seeing some rather worrying size increases
however.  For example, on an aarch64 board such as leez-rk3399 we have:
     leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
        u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
          function                                   old     new   delta
          do_fdt                                    3992    4308    +316
          fdt_check_header                           276     492    +216
          fdt_add_property_                          388     556    +168
          fdt_ro_probe_                              128     252    +124
          fdt_packblocks_                            176     296    +120
          fdt_open_into                              392     512    +120
          fdt_blocks_misordered_                      96     216    +120
          fdt_get_mem_rsv                            108     220    +112
          fdt_offset_ptr                             104     208    +104
          fdt_get_string                             288     388    +100
          fdt_splice_                                148     228     +80
          fdt_valid                                  204     276     +72
          fdt_getprop_by_offset                      184     256     +72
          fdt_splice_mem_rsv_                         96     152     +56
          fdt_num_mem_rsv                             60     116     +56
          fdt_splice_struct_                          96     144     +48
          fdt_shrink_to_minimum                      220     268     +48
          fdt_rw_probe_                              108     156     +48
          fdt_mem_rsv                                 60     108     +48
          fdt_getprop_namelen                        100     148     +48
          fdt_get_property_by_offset_                100     148     +48
          fdt_get_name                               164     212     +48
          efi_install_fdt                            964    1012     +48
          boot_get_fdt                               888     936     +48
          fdt_move                                    80     116     +36
          reserve_fdt                                 72      96     +24
          reloc_fdt                                   76     100     +24
          image_setup_libfdt                         284     308     +24
          fit_image_load                            1608    1632     +24
          fit_image_get_data_and_size                172     196     +24
          fit_get_end                                 16      40     +24
          fdt_next_tag                               256     280     +24
          fdt_header_size                             12      36     +24
          fdt_get_property_namelen_                  212     236     +24
          fdt_get_property_namelen                    44      68     +24
          fdt_get_phandle                            120     144     +24
          fdt_del_node                                96     120     +24
          fdt_del_mem_rsv                            112     136     +24
          fdt_add_subnode_namelen                    284     308     +24
          fdt_add_mem_rsv                            124     148     +24
          common_diskboot                            680     704     +24
          fdt_next_subnode                            80      88      +8
        spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
          function                                   old     new   delta
          fdt_get_mem_rsv                             52     236    +184
          fdt_add_property_                          340     484    +144
          fdt_get_name                                68     152     +84
          fdt_splice_                                148     228     +80
          fdt_num_mem_rsv                             64     128     +64
          fdt_splice_mem_rsv_                         96     152     +56
          fdt_offset_ptr                              52     104     +52
          fdt_splice_struct_                          96     144     +48
          fdt_shrink_to_minimum                      220     268     +48
          fdt_get_property_by_offset_                 36      84     +48
          fdt_ro_probe_                                -      36     +36
          fdt_subnode_offset_namelen                 200     232     +32
          spl_load_simple_fit                        848     872     +24
          spl_fit_append_fdt                         196     220     +24
          fdt_getprop_by_offset                       80     104     +24
          fdt_get_string                              64      88     +24
          fdt_get_property_namelen_                  200     224     +24
          fdt_get_phandle                            120     144     +24
          fdt_del_mem_rsv                            104     128     +24
          fdt_check_header                            28      52     +24
          fdt_add_subnode_namelen                    260     284     +24
          fdt_add_mem_rsv                            112     136     +24
          fdt_next_tag                               192     212     +20
          fdt_path_offset_namelen                    240     256     +16
          fdt_supernode_atdepth_offset               160     172     +12
          fdt_node_offset_by_phandle                 120     132     +12
          fdt_mem_rsv                                 20       -     -20
          fdt_check_prop_offset_                      64      44     -20
          fdt_check_node_offset_                      64      44     -20

And note that for the SPL case we're already setting ASSUME_MASK to
0xff so there's maximum savings already being done there.  Does anyone
have ideas on where / how to further tweak code size?  Thanks!

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-16 19:36 Size growth? Tom Rini
@ 2020-10-16 21:46 ` Simon Glass
       [not found]   ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Glass @ 2020-10-16 21:46 UTC (permalink / raw)
  To: Tom Rini, David Gibson; +Cc: Devicetree Compiler

Hi Tom,

On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>
> Hey all,
>
> With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> to cbca977ea121.  I'm seeing some rather worrying size increases
> however.  For example, on an aarch64 board such as leez-rk3399 we have:
>      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
>         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
>           function                                   old     new   delta
>           do_fdt                                    3992    4308    +316
>           fdt_check_header                           276     492    +216
>           fdt_add_property_                          388     556    +168
>           fdt_ro_probe_                              128     252    +124
>           fdt_packblocks_                            176     296    +120
>           fdt_open_into                              392     512    +120
>           fdt_blocks_misordered_                      96     216    +120
>           fdt_get_mem_rsv                            108     220    +112
>           fdt_offset_ptr                             104     208    +104
>           fdt_get_string                             288     388    +100
>           fdt_splice_                                148     228     +80
>           fdt_valid                                  204     276     +72
>           fdt_getprop_by_offset                      184     256     +72
>           fdt_splice_mem_rsv_                         96     152     +56
>           fdt_num_mem_rsv                             60     116     +56
>           fdt_splice_struct_                          96     144     +48
>           fdt_shrink_to_minimum                      220     268     +48
>           fdt_rw_probe_                              108     156     +48
>           fdt_mem_rsv                                 60     108     +48
>           fdt_getprop_namelen                        100     148     +48
>           fdt_get_property_by_offset_                100     148     +48
>           fdt_get_name                               164     212     +48
>           efi_install_fdt                            964    1012     +48
>           boot_get_fdt                               888     936     +48
>           fdt_move                                    80     116     +36
>           reserve_fdt                                 72      96     +24
>           reloc_fdt                                   76     100     +24
>           image_setup_libfdt                         284     308     +24
>           fit_image_load                            1608    1632     +24
>           fit_image_get_data_and_size                172     196     +24
>           fit_get_end                                 16      40     +24
>           fdt_next_tag                               256     280     +24
>           fdt_header_size                             12      36     +24
>           fdt_get_property_namelen_                  212     236     +24
>           fdt_get_property_namelen                    44      68     +24
>           fdt_get_phandle                            120     144     +24
>           fdt_del_node                                96     120     +24
>           fdt_del_mem_rsv                            112     136     +24
>           fdt_add_subnode_namelen                    284     308     +24
>           fdt_add_mem_rsv                            124     148     +24
>           common_diskboot                            680     704     +24
>           fdt_next_subnode                            80      88      +8
>         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
>           function                                   old     new   delta
>           fdt_get_mem_rsv                             52     236    +184
>           fdt_add_property_                          340     484    +144
>           fdt_get_name                                68     152     +84
>           fdt_splice_                                148     228     +80
>           fdt_num_mem_rsv                             64     128     +64
>           fdt_splice_mem_rsv_                         96     152     +56
>           fdt_offset_ptr                              52     104     +52
>           fdt_splice_struct_                          96     144     +48
>           fdt_shrink_to_minimum                      220     268     +48
>           fdt_get_property_by_offset_                 36      84     +48
>           fdt_ro_probe_                                -      36     +36
>           fdt_subnode_offset_namelen                 200     232     +32
>           spl_load_simple_fit                        848     872     +24
>           spl_fit_append_fdt                         196     220     +24
>           fdt_getprop_by_offset                       80     104     +24
>           fdt_get_string                              64      88     +24
>           fdt_get_property_namelen_                  200     224     +24
>           fdt_get_phandle                            120     144     +24
>           fdt_del_mem_rsv                            104     128     +24
>           fdt_check_header                            28      52     +24
>           fdt_add_subnode_namelen                    260     284     +24
>           fdt_add_mem_rsv                            112     136     +24
>           fdt_next_tag                               192     212     +20
>           fdt_path_offset_namelen                    240     256     +16
>           fdt_supernode_atdepth_offset               160     172     +12
>           fdt_node_offset_by_phandle                 120     132     +12
>           fdt_mem_rsv                                 20       -     -20
>           fdt_check_prop_offset_                      64      44     -20
>           fdt_check_node_offset_                      64      44     -20
>
> And note that for the SPL case we're already setting ASSUME_MASK to
> 0xff so there's maximum savings already being done there.  Does anyone
> have ideas on where / how to further tweak code size?  Thanks!

+David Gibson

I suspect there are more checks that need to be made conditional.

Regards,
Simon

> --
> Tom

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]   ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-19  1:42     ` David Gibson
       [not found]       ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-19  1:42 UTC (permalink / raw)
  To: Simon Glass; +Cc: Tom Rini, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 7052 bytes --]

On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >
> > Hey all,
> >
> > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > to cbca977ea121.  I'm seeing some rather worrying size increases
> > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> >           function                                   old     new   delta
> >           do_fdt                                    3992    4308    +316
> >           fdt_check_header                           276     492    +216
> >           fdt_add_property_                          388     556    +168
> >           fdt_ro_probe_                              128     252    +124
> >           fdt_packblocks_                            176     296    +120
> >           fdt_open_into                              392     512    +120
> >           fdt_blocks_misordered_                      96     216    +120
> >           fdt_get_mem_rsv                            108     220    +112
> >           fdt_offset_ptr                             104     208    +104
> >           fdt_get_string                             288     388    +100
> >           fdt_splice_                                148     228     +80
> >           fdt_valid                                  204     276     +72
> >           fdt_getprop_by_offset                      184     256     +72
> >           fdt_splice_mem_rsv_                         96     152     +56
> >           fdt_num_mem_rsv                             60     116     +56
> >           fdt_splice_struct_                          96     144     +48
> >           fdt_shrink_to_minimum                      220     268     +48
> >           fdt_rw_probe_                              108     156     +48
> >           fdt_mem_rsv                                 60     108     +48
> >           fdt_getprop_namelen                        100     148     +48
> >           fdt_get_property_by_offset_                100     148     +48
> >           fdt_get_name                               164     212     +48
> >           efi_install_fdt                            964    1012     +48
> >           boot_get_fdt                               888     936     +48
> >           fdt_move                                    80     116     +36
> >           reserve_fdt                                 72      96     +24
> >           reloc_fdt                                   76     100     +24
> >           image_setup_libfdt                         284     308     +24
> >           fit_image_load                            1608    1632     +24
> >           fit_image_get_data_and_size                172     196     +24
> >           fit_get_end                                 16      40     +24
> >           fdt_next_tag                               256     280     +24
> >           fdt_header_size                             12      36     +24
> >           fdt_get_property_namelen_                  212     236     +24
> >           fdt_get_property_namelen                    44      68     +24
> >           fdt_get_phandle                            120     144     +24
> >           fdt_del_node                                96     120     +24
> >           fdt_del_mem_rsv                            112     136     +24
> >           fdt_add_subnode_namelen                    284     308     +24
> >           fdt_add_mem_rsv                            124     148     +24
> >           common_diskboot                            680     704     +24
> >           fdt_next_subnode                            80      88      +8
> >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> >           function                                   old     new   delta
> >           fdt_get_mem_rsv                             52     236    +184
> >           fdt_add_property_                          340     484    +144
> >           fdt_get_name                                68     152     +84
> >           fdt_splice_                                148     228     +80
> >           fdt_num_mem_rsv                             64     128     +64
> >           fdt_splice_mem_rsv_                         96     152     +56
> >           fdt_offset_ptr                              52     104     +52
> >           fdt_splice_struct_                          96     144     +48
> >           fdt_shrink_to_minimum                      220     268     +48
> >           fdt_get_property_by_offset_                 36      84     +48
> >           fdt_ro_probe_                                -      36     +36
> >           fdt_subnode_offset_namelen                 200     232     +32
> >           spl_load_simple_fit                        848     872     +24
> >           spl_fit_append_fdt                         196     220     +24
> >           fdt_getprop_by_offset                       80     104     +24
> >           fdt_get_string                              64      88     +24
> >           fdt_get_property_namelen_                  200     224     +24
> >           fdt_get_phandle                            120     144     +24
> >           fdt_del_mem_rsv                            104     128     +24
> >           fdt_check_header                            28      52     +24
> >           fdt_add_subnode_namelen                    260     284     +24
> >           fdt_add_mem_rsv                            112     136     +24
> >           fdt_next_tag                               192     212     +20
> >           fdt_path_offset_namelen                    240     256     +16
> >           fdt_supernode_atdepth_offset               160     172     +12
> >           fdt_node_offset_by_phandle                 120     132     +12
> >           fdt_mem_rsv                                 20       -     -20
> >           fdt_check_prop_offset_                      64      44     -20
> >           fdt_check_node_offset_                      64      44     -20
> >
> > And note that for the SPL case we're already setting ASSUME_MASK to
> > 0xff so there's maximum savings already being done there.  Does anyone
> > have ideas on where / how to further tweak code size?  Thanks!
> 
> +David Gibson
> 
> I suspect there are more checks that need to be made conditional.

Seems likely.

Though, as I've opined before, from what I understand the SPL is *so*
restricted an environment, I'm not really convinced DTs are the right
tool for the job there.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]       ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-19 12:37         ` Tom Rini
  2020-10-20  2:09           ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-19 12:37 UTC (permalink / raw)
  To: David Gibson; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 7613 bytes --]

On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > Hi Tom,
> > 
> > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > >
> > > Hey all,
> > >
> > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > >           function                                   old     new   delta
> > >           do_fdt                                    3992    4308    +316
> > >           fdt_check_header                           276     492    +216
> > >           fdt_add_property_                          388     556    +168
> > >           fdt_ro_probe_                              128     252    +124
> > >           fdt_packblocks_                            176     296    +120
> > >           fdt_open_into                              392     512    +120
> > >           fdt_blocks_misordered_                      96     216    +120
> > >           fdt_get_mem_rsv                            108     220    +112
> > >           fdt_offset_ptr                             104     208    +104
> > >           fdt_get_string                             288     388    +100
> > >           fdt_splice_                                148     228     +80
> > >           fdt_valid                                  204     276     +72
> > >           fdt_getprop_by_offset                      184     256     +72
> > >           fdt_splice_mem_rsv_                         96     152     +56
> > >           fdt_num_mem_rsv                             60     116     +56
> > >           fdt_splice_struct_                          96     144     +48
> > >           fdt_shrink_to_minimum                      220     268     +48
> > >           fdt_rw_probe_                              108     156     +48
> > >           fdt_mem_rsv                                 60     108     +48
> > >           fdt_getprop_namelen                        100     148     +48
> > >           fdt_get_property_by_offset_                100     148     +48
> > >           fdt_get_name                               164     212     +48
> > >           efi_install_fdt                            964    1012     +48
> > >           boot_get_fdt                               888     936     +48
> > >           fdt_move                                    80     116     +36
> > >           reserve_fdt                                 72      96     +24
> > >           reloc_fdt                                   76     100     +24
> > >           image_setup_libfdt                         284     308     +24
> > >           fit_image_load                            1608    1632     +24
> > >           fit_image_get_data_and_size                172     196     +24
> > >           fit_get_end                                 16      40     +24
> > >           fdt_next_tag                               256     280     +24
> > >           fdt_header_size                             12      36     +24
> > >           fdt_get_property_namelen_                  212     236     +24
> > >           fdt_get_property_namelen                    44      68     +24
> > >           fdt_get_phandle                            120     144     +24
> > >           fdt_del_node                                96     120     +24
> > >           fdt_del_mem_rsv                            112     136     +24
> > >           fdt_add_subnode_namelen                    284     308     +24
> > >           fdt_add_mem_rsv                            124     148     +24
> > >           common_diskboot                            680     704     +24
> > >           fdt_next_subnode                            80      88      +8
> > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > >           function                                   old     new   delta
> > >           fdt_get_mem_rsv                             52     236    +184
> > >           fdt_add_property_                          340     484    +144
> > >           fdt_get_name                                68     152     +84
> > >           fdt_splice_                                148     228     +80
> > >           fdt_num_mem_rsv                             64     128     +64
> > >           fdt_splice_mem_rsv_                         96     152     +56
> > >           fdt_offset_ptr                              52     104     +52
> > >           fdt_splice_struct_                          96     144     +48
> > >           fdt_shrink_to_minimum                      220     268     +48
> > >           fdt_get_property_by_offset_                 36      84     +48
> > >           fdt_ro_probe_                                -      36     +36
> > >           fdt_subnode_offset_namelen                 200     232     +32
> > >           spl_load_simple_fit                        848     872     +24
> > >           spl_fit_append_fdt                         196     220     +24
> > >           fdt_getprop_by_offset                       80     104     +24
> > >           fdt_get_string                              64      88     +24
> > >           fdt_get_property_namelen_                  200     224     +24
> > >           fdt_get_phandle                            120     144     +24
> > >           fdt_del_mem_rsv                            104     128     +24
> > >           fdt_check_header                            28      52     +24
> > >           fdt_add_subnode_namelen                    260     284     +24
> > >           fdt_add_mem_rsv                            112     136     +24
> > >           fdt_next_tag                               192     212     +20
> > >           fdt_path_offset_namelen                    240     256     +16
> > >           fdt_supernode_atdepth_offset               160     172     +12
> > >           fdt_node_offset_by_phandle                 120     132     +12
> > >           fdt_mem_rsv                                 20       -     -20
> > >           fdt_check_prop_offset_                      64      44     -20
> > >           fdt_check_node_offset_                      64      44     -20
> > >
> > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > 0xff so there's maximum savings already being done there.  Does anyone
> > > have ideas on where / how to further tweak code size?  Thanks!
> > 
> > +David Gibson
> > 
> > I suspect there are more checks that need to be made conditional.
> 
> Seems likely.

OK.  Does that mean you're going to take a look?

> Though, as I've opined before, from what I understand the SPL is *so*
> restricted an environment, I'm not really convinced DTs are the right
> tool for the job there.

SPL is space constrained, which is why we mask out all of the safety
checks.  But we still generally have enough memory that this is fine.
This specific board has 256KiB for SPL, for example.  But I'm not just
concerned about 1KiB of growth when everything is masked off, I'm also
concerned about 2KiB of growth over a changelog that doesn't read like
it added a bunch of stuff that should cause everything to grow, either.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-19 12:37         ` Tom Rini
@ 2020-10-20  2:09           ` David Gibson
       [not found]             ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-20  2:09 UTC (permalink / raw)
  To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 8300 bytes --]

On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > Hi Tom,
> > > 
> > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > >           function                                   old     new   delta
> > > >           do_fdt                                    3992    4308    +316
> > > >           fdt_check_header                           276     492    +216
> > > >           fdt_add_property_                          388     556    +168
> > > >           fdt_ro_probe_                              128     252    +124
> > > >           fdt_packblocks_                            176     296    +120
> > > >           fdt_open_into                              392     512    +120
> > > >           fdt_blocks_misordered_                      96     216    +120
> > > >           fdt_get_mem_rsv                            108     220    +112
> > > >           fdt_offset_ptr                             104     208    +104
> > > >           fdt_get_string                             288     388    +100
> > > >           fdt_splice_                                148     228     +80
> > > >           fdt_valid                                  204     276     +72
> > > >           fdt_getprop_by_offset                      184     256     +72
> > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > >           fdt_num_mem_rsv                             60     116     +56
> > > >           fdt_splice_struct_                          96     144     +48
> > > >           fdt_shrink_to_minimum                      220     268     +48
> > > >           fdt_rw_probe_                              108     156     +48
> > > >           fdt_mem_rsv                                 60     108     +48
> > > >           fdt_getprop_namelen                        100     148     +48
> > > >           fdt_get_property_by_offset_                100     148     +48
> > > >           fdt_get_name                               164     212     +48
> > > >           efi_install_fdt                            964    1012     +48
> > > >           boot_get_fdt                               888     936     +48
> > > >           fdt_move                                    80     116     +36
> > > >           reserve_fdt                                 72      96     +24
> > > >           reloc_fdt                                   76     100     +24
> > > >           image_setup_libfdt                         284     308     +24
> > > >           fit_image_load                            1608    1632     +24
> > > >           fit_image_get_data_and_size                172     196     +24
> > > >           fit_get_end                                 16      40     +24
> > > >           fdt_next_tag                               256     280     +24
> > > >           fdt_header_size                             12      36     +24
> > > >           fdt_get_property_namelen_                  212     236     +24
> > > >           fdt_get_property_namelen                    44      68     +24
> > > >           fdt_get_phandle                            120     144     +24
> > > >           fdt_del_node                                96     120     +24
> > > >           fdt_del_mem_rsv                            112     136     +24
> > > >           fdt_add_subnode_namelen                    284     308     +24
> > > >           fdt_add_mem_rsv                            124     148     +24
> > > >           common_diskboot                            680     704     +24
> > > >           fdt_next_subnode                            80      88      +8
> > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > >           function                                   old     new   delta
> > > >           fdt_get_mem_rsv                             52     236    +184
> > > >           fdt_add_property_                          340     484    +144
> > > >           fdt_get_name                                68     152     +84
> > > >           fdt_splice_                                148     228     +80
> > > >           fdt_num_mem_rsv                             64     128     +64
> > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > >           fdt_offset_ptr                              52     104     +52
> > > >           fdt_splice_struct_                          96     144     +48
> > > >           fdt_shrink_to_minimum                      220     268     +48
> > > >           fdt_get_property_by_offset_                 36      84     +48
> > > >           fdt_ro_probe_                                -      36     +36
> > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > >           spl_load_simple_fit                        848     872     +24
> > > >           spl_fit_append_fdt                         196     220     +24
> > > >           fdt_getprop_by_offset                       80     104     +24
> > > >           fdt_get_string                              64      88     +24
> > > >           fdt_get_property_namelen_                  200     224     +24
> > > >           fdt_get_phandle                            120     144     +24
> > > >           fdt_del_mem_rsv                            104     128     +24
> > > >           fdt_check_header                            28      52     +24
> > > >           fdt_add_subnode_namelen                    260     284     +24
> > > >           fdt_add_mem_rsv                            112     136     +24
> > > >           fdt_next_tag                               192     212     +20
> > > >           fdt_path_offset_namelen                    240     256     +16
> > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > >           fdt_mem_rsv                                 20       -     -20
> > > >           fdt_check_prop_offset_                      64      44     -20
> > > >           fdt_check_node_offset_                      64      44     -20
> > > >
> > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > have ideas on where / how to further tweak code size?  Thanks!
> > > 
> > > +David Gibson
> > > 
> > > I suspect there are more checks that need to be made conditional.
> > 
> > Seems likely.
> 
> OK.  Does that mean you're going to take a look?

No, sorry, my time for dtc is very limited.

> > Though, as I've opined before, from what I understand the SPL is *so*
> > restricted an environment, I'm not really convinced DTs are the right
> > tool for the job there.
> 
> SPL is space constrained, which is why we mask out all of the safety
> checks.  But we still generally have enough memory that this is fine.
> This specific board has 256KiB for SPL, for example.  But I'm not just
> concerned about 1KiB of growth when everything is masked off, I'm also
> concerned about 2KiB of growth over a changelog that doesn't read like
> it added a bunch of stuff that should cause everything to grow,
> either.

Yeah, that's a good point.  It's certainly not obvious to me what
would have caused the growth.  I think we'll need a revision by
revision test to see where it was added.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]             ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-21 22:49               ` Tom Rini
  2020-10-22  4:00                 ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-21 22:49 UTC (permalink / raw)
  To: David Gibson; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 10806 bytes --]

On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > > 
> > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > >
> > > > > Hey all,
> > > > >
> > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > >           function                                   old     new   delta
> > > > >           do_fdt                                    3992    4308    +316
> > > > >           fdt_check_header                           276     492    +216
> > > > >           fdt_add_property_                          388     556    +168
> > > > >           fdt_ro_probe_                              128     252    +124
> > > > >           fdt_packblocks_                            176     296    +120
> > > > >           fdt_open_into                              392     512    +120
> > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > >           fdt_offset_ptr                             104     208    +104
> > > > >           fdt_get_string                             288     388    +100
> > > > >           fdt_splice_                                148     228     +80
> > > > >           fdt_valid                                  204     276     +72
> > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > >           fdt_splice_struct_                          96     144     +48
> > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > >           fdt_rw_probe_                              108     156     +48
> > > > >           fdt_mem_rsv                                 60     108     +48
> > > > >           fdt_getprop_namelen                        100     148     +48
> > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > >           fdt_get_name                               164     212     +48
> > > > >           efi_install_fdt                            964    1012     +48
> > > > >           boot_get_fdt                               888     936     +48
> > > > >           fdt_move                                    80     116     +36
> > > > >           reserve_fdt                                 72      96     +24
> > > > >           reloc_fdt                                   76     100     +24
> > > > >           image_setup_libfdt                         284     308     +24
> > > > >           fit_image_load                            1608    1632     +24
> > > > >           fit_image_get_data_and_size                172     196     +24
> > > > >           fit_get_end                                 16      40     +24
> > > > >           fdt_next_tag                               256     280     +24
> > > > >           fdt_header_size                             12      36     +24
> > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > >           fdt_get_property_namelen                    44      68     +24
> > > > >           fdt_get_phandle                            120     144     +24
> > > > >           fdt_del_node                                96     120     +24
> > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > >           common_diskboot                            680     704     +24
> > > > >           fdt_next_subnode                            80      88      +8
> > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > >           function                                   old     new   delta
> > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > >           fdt_add_property_                          340     484    +144
> > > > >           fdt_get_name                                68     152     +84
> > > > >           fdt_splice_                                148     228     +80
> > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > >           fdt_offset_ptr                              52     104     +52
> > > > >           fdt_splice_struct_                          96     144     +48
> > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > >           fdt_ro_probe_                                -      36     +36
> > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > >           spl_load_simple_fit                        848     872     +24
> > > > >           spl_fit_append_fdt                         196     220     +24
> > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > >           fdt_get_string                              64      88     +24
> > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > >           fdt_get_phandle                            120     144     +24
> > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > >           fdt_check_header                            28      52     +24
> > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > >           fdt_next_tag                               192     212     +20
> > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > >           fdt_mem_rsv                                 20       -     -20
> > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > >           fdt_check_node_offset_                      64      44     -20
> > > > >
> > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > 
> > > > +David Gibson
> > > > 
> > > > I suspect there are more checks that need to be made conditional.
> > > 
> > > Seems likely.
> > 
> > OK.  Does that mean you're going to take a look?
> 
> No, sorry, my time for dtc is very limited.

OK, fair point.

> > > Though, as I've opined before, from what I understand the SPL is *so*
> > > restricted an environment, I'm not really convinced DTs are the right
> > > tool for the job there.
> > 
> > SPL is space constrained, which is why we mask out all of the safety
> > checks.  But we still generally have enough memory that this is fine.
> > This specific board has 256KiB for SPL, for example.  But I'm not just
> > concerned about 1KiB of growth when everything is masked off, I'm also
> > concerned about 2KiB of growth over a changelog that doesn't read like
> > it added a bunch of stuff that should cause everything to grow,
> > either.
> 
> Yeah, that's a good point.  It's certainly not obvious to me what
> would have caused the growth.  I think we'll need a revision by
> revision test to see where it was added.

So, with a bit of playing around, I've used buildman to give us that,
for the leez-rk3399 platform.  The full results are at
https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
that build #6, "scripts/dtc: Update to upstream version
v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).

But what does all of this _mean_ ?  I kinda think I have an answer now.
One of the things that sticks out is 6dcb8ba408ec adds a lot and
11738cf01f15 reduces it just a little.

With a re-revert again locally, there's just a few size
growths now to look in to on the SPL side:
     leez-rk3399    : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8
        u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8)
          function                                   old     new   delta
          fdt_move                                    80      92     +12
          fdt_splice_                                148     156      +8
          fdt_offset_ptr                             104     112      +8
          fdt_get_string                             288     268     -20
        spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180)
          function                                   old     new   delta
          fdt_get_name                                68     128     +60
          fdt_get_mem_rsv                             52      96     +44
          fdt_subnode_offset_namelen                 200     232     +32
          fdt_next_tag                               192     212     +20
          fdt_path_offset_namelen                    240     256     +16
          fdt_supernode_atdepth_offset               160     172     +12
          fdt_ro_probe_                                -      12     +12
          fdt_node_offset_by_phandle                 120     132     +12
          fdt_splice_                                148     156      +8
          fdt_offset_ptr                              52      56      +4
          fdt_check_prop_offset_                      64      44     -20
          fdt_check_node_offset_                      64      44     -20

Of those, I'm not 100% sure.  That might well end up being Simon's
suggestion of a few places needing a check they don't have, I'm not
sure.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-21 22:49               ` Tom Rini
@ 2020-10-22  4:00                 ` David Gibson
       [not found]                   ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-22  4:00 UTC (permalink / raw)
  To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 12020 bytes --]

On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > > Hi Tom,
> > > > > 
> > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > > >           function                                   old     new   delta
> > > > > >           do_fdt                                    3992    4308    +316
> > > > > >           fdt_check_header                           276     492    +216
> > > > > >           fdt_add_property_                          388     556    +168
> > > > > >           fdt_ro_probe_                              128     252    +124
> > > > > >           fdt_packblocks_                            176     296    +120
> > > > > >           fdt_open_into                              392     512    +120
> > > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > > >           fdt_offset_ptr                             104     208    +104
> > > > > >           fdt_get_string                             288     388    +100
> > > > > >           fdt_splice_                                148     228     +80
> > > > > >           fdt_valid                                  204     276     +72
> > > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > >           fdt_rw_probe_                              108     156     +48
> > > > > >           fdt_mem_rsv                                 60     108     +48
> > > > > >           fdt_getprop_namelen                        100     148     +48
> > > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > > >           fdt_get_name                               164     212     +48
> > > > > >           efi_install_fdt                            964    1012     +48
> > > > > >           boot_get_fdt                               888     936     +48
> > > > > >           fdt_move                                    80     116     +36
> > > > > >           reserve_fdt                                 72      96     +24
> > > > > >           reloc_fdt                                   76     100     +24
> > > > > >           image_setup_libfdt                         284     308     +24
> > > > > >           fit_image_load                            1608    1632     +24
> > > > > >           fit_image_get_data_and_size                172     196     +24
> > > > > >           fit_get_end                                 16      40     +24
> > > > > >           fdt_next_tag                               256     280     +24
> > > > > >           fdt_header_size                             12      36     +24
> > > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > > >           fdt_get_property_namelen                    44      68     +24
> > > > > >           fdt_get_phandle                            120     144     +24
> > > > > >           fdt_del_node                                96     120     +24
> > > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > > >           common_diskboot                            680     704     +24
> > > > > >           fdt_next_subnode                            80      88      +8
> > > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > > >           function                                   old     new   delta
> > > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > > >           fdt_add_property_                          340     484    +144
> > > > > >           fdt_get_name                                68     152     +84
> > > > > >           fdt_splice_                                148     228     +80
> > > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > >           fdt_offset_ptr                              52     104     +52
> > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > > >           fdt_ro_probe_                                -      36     +36
> > > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > > >           spl_load_simple_fit                        848     872     +24
> > > > > >           spl_fit_append_fdt                         196     220     +24
> > > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > > >           fdt_get_string                              64      88     +24
> > > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > > >           fdt_get_phandle                            120     144     +24
> > > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > > >           fdt_check_header                            28      52     +24
> > > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > > >           fdt_next_tag                               192     212     +20
> > > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > > >           fdt_mem_rsv                                 20       -     -20
> > > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > > >           fdt_check_node_offset_                      64      44     -20
> > > > > >
> > > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > > 
> > > > > +David Gibson
> > > > > 
> > > > > I suspect there are more checks that need to be made conditional.
> > > > 
> > > > Seems likely.
> > > 
> > > OK.  Does that mean you're going to take a look?
> > 
> > No, sorry, my time for dtc is very limited.
> 
> OK, fair point.
> 
> > > > Though, as I've opined before, from what I understand the SPL is *so*
> > > > restricted an environment, I'm not really convinced DTs are the right
> > > > tool for the job there.
> > > 
> > > SPL is space constrained, which is why we mask out all of the safety
> > > checks.  But we still generally have enough memory that this is fine.
> > > This specific board has 256KiB for SPL, for example.  But I'm not just
> > > concerned about 1KiB of growth when everything is masked off, I'm also
> > > concerned about 2KiB of growth over a changelog that doesn't read like
> > > it added a bunch of stuff that should cause everything to grow,
> > > either.
> > 
> > Yeah, that's a good point.  It's certainly not obvious to me what
> > would have caused the growth.  I think we'll need a revision by
> > revision test to see where it was added.
> 
> So, with a bit of playing around, I've used buildman to give us that,
> for the leez-rk3399 platform.  The full results are at
> https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
> that build #6, "scripts/dtc: Update to upstream version
> v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
> entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).

Right.. I was meaning stepping through the dtc upstream revisions, not
just the uboot revisions, so you get a fine grained idea of where the
increase is coming from.

> But what does all of this _mean_ ?  I kinda think I have an answer now.
> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> 11738cf01f15 reduces it just a little.

Ah, that's a tricky one.  If we don't handle unaligned accesses we
instead get intermittent bug reports where it just crashes.

I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
break for either an unaligned dtb (unlikely) or if you attempt to load
an unaligned value from a property (more likely, but don't add the
flag if you're not sure you don't need it).

> With a re-revert again locally, there's just a few size
> growths now to look in to on the SPL side:
>      leez-rk3399    : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8
>         u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8)
>           function                                   old     new   delta
>           fdt_move                                    80      92     +12
>           fdt_splice_                                148     156      +8
>           fdt_offset_ptr                             104     112      +8
>           fdt_get_string                             288     268     -20
>         spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180)
>           function                                   old     new   delta
>           fdt_get_name                                68     128     +60
>           fdt_get_mem_rsv                             52      96     +44
>           fdt_subnode_offset_namelen                 200     232     +32
>           fdt_next_tag                               192     212     +20
>           fdt_path_offset_namelen                    240     256     +16
>           fdt_supernode_atdepth_offset               160     172     +12
>           fdt_ro_probe_                                -      12     +12
>           fdt_node_offset_by_phandle                 120     132     +12
>           fdt_splice_                                148     156      +8
>           fdt_offset_ptr                              52      56      +4
>           fdt_check_prop_offset_                      64      44     -20
>           fdt_check_node_offset_                      64      44     -20
> 
> Of those, I'm not 100% sure.  That might well end up being Simon's
> suggestion of a few places needing a check they don't have, I'm not
> sure.

Maybe.  Really need to know which dtc/libfdt revision makes the
changes to decipher that.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                   ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-22 12:32                     ` Tom Rini
  2020-10-22 14:58                       ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-22 12:32 UTC (permalink / raw)
  To: David Gibson; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 13196 bytes --]

On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > > > Hi Tom,
> > > > > > 
> > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > > > >           function                                   old     new   delta
> > > > > > >           do_fdt                                    3992    4308    +316
> > > > > > >           fdt_check_header                           276     492    +216
> > > > > > >           fdt_add_property_                          388     556    +168
> > > > > > >           fdt_ro_probe_                              128     252    +124
> > > > > > >           fdt_packblocks_                            176     296    +120
> > > > > > >           fdt_open_into                              392     512    +120
> > > > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > > > >           fdt_offset_ptr                             104     208    +104
> > > > > > >           fdt_get_string                             288     388    +100
> > > > > > >           fdt_splice_                                148     228     +80
> > > > > > >           fdt_valid                                  204     276     +72
> > > > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > >           fdt_rw_probe_                              108     156     +48
> > > > > > >           fdt_mem_rsv                                 60     108     +48
> > > > > > >           fdt_getprop_namelen                        100     148     +48
> > > > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > > > >           fdt_get_name                               164     212     +48
> > > > > > >           efi_install_fdt                            964    1012     +48
> > > > > > >           boot_get_fdt                               888     936     +48
> > > > > > >           fdt_move                                    80     116     +36
> > > > > > >           reserve_fdt                                 72      96     +24
> > > > > > >           reloc_fdt                                   76     100     +24
> > > > > > >           image_setup_libfdt                         284     308     +24
> > > > > > >           fit_image_load                            1608    1632     +24
> > > > > > >           fit_image_get_data_and_size                172     196     +24
> > > > > > >           fit_get_end                                 16      40     +24
> > > > > > >           fdt_next_tag                               256     280     +24
> > > > > > >           fdt_header_size                             12      36     +24
> > > > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > > > >           fdt_get_property_namelen                    44      68     +24
> > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > >           fdt_del_node                                96     120     +24
> > > > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > > > >           common_diskboot                            680     704     +24
> > > > > > >           fdt_next_subnode                            80      88      +8
> > > > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > > > >           function                                   old     new   delta
> > > > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > > > >           fdt_add_property_                          340     484    +144
> > > > > > >           fdt_get_name                                68     152     +84
> > > > > > >           fdt_splice_                                148     228     +80
> > > > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > >           fdt_offset_ptr                              52     104     +52
> > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > > > >           fdt_ro_probe_                                -      36     +36
> > > > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > > > >           spl_load_simple_fit                        848     872     +24
> > > > > > >           spl_fit_append_fdt                         196     220     +24
> > > > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > > > >           fdt_get_string                              64      88     +24
> > > > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > > > >           fdt_check_header                            28      52     +24
> > > > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > > > >           fdt_next_tag                               192     212     +20
> > > > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > > > >           fdt_mem_rsv                                 20       -     -20
> > > > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > > > >           fdt_check_node_offset_                      64      44     -20
> > > > > > >
> > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > > > 
> > > > > > +David Gibson
> > > > > > 
> > > > > > I suspect there are more checks that need to be made conditional.
> > > > > 
> > > > > Seems likely.
> > > > 
> > > > OK.  Does that mean you're going to take a look?
> > > 
> > > No, sorry, my time for dtc is very limited.
> > 
> > OK, fair point.
> > 
> > > > > Though, as I've opined before, from what I understand the SPL is *so*
> > > > > restricted an environment, I'm not really convinced DTs are the right
> > > > > tool for the job there.
> > > > 
> > > > SPL is space constrained, which is why we mask out all of the safety
> > > > checks.  But we still generally have enough memory that this is fine.
> > > > This specific board has 256KiB for SPL, for example.  But I'm not just
> > > > concerned about 1KiB of growth when everything is masked off, I'm also
> > > > concerned about 2KiB of growth over a changelog that doesn't read like
> > > > it added a bunch of stuff that should cause everything to grow,
> > > > either.
> > > 
> > > Yeah, that's a good point.  It's certainly not obvious to me what
> > > would have caused the growth.  I think we'll need a revision by
> > > revision test to see where it was added.
> > 
> > So, with a bit of playing around, I've used buildman to give us that,
> > for the leez-rk3399 platform.  The full results are at
> > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
> > that build #6, "scripts/dtc: Update to upstream version
> > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
> > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).
> 
> Right.. I was meaning stepping through the dtc upstream revisions, not
> just the uboot revisions, so you get a fine grained idea of where the
> increase is coming from.

Yes, that's what that link is.  I imported every revision from point A
to point B, and built U-Boot for it, and checked the sizes, since we
have tooling for that.  It's possible the bloat-o-meter script in the
kernel could be used, but I'm not sure what the right target(s) in dtc
would be, to get any useful information out.

> > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > 11738cf01f15 reduces it just a little.
> 
> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> instead get intermittent bug reports where it just crashes.

We really need to talk about that then.  There was a problem of people
turning off the sanity check for making sure the entire device tree was
aligned and then having everything crash.

> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> break for either an unaligned dtb (unlikely) or if you attempt to load
> an unaligned value from a property (more likely, but don't add the
> flag if you're not sure you don't need it).

So long as it's abstracted in such a way that we don't grow the size of
everything again, yes, that is the right way forward I think.  These
tests still introduce a big boot-time slowdown as well.

> > With a re-revert again locally, there's just a few size
> > growths now to look in to on the SPL side:
> >      leez-rk3399    : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8
> >         u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8)
> >           function                                   old     new   delta
> >           fdt_move                                    80      92     +12
> >           fdt_splice_                                148     156      +8
> >           fdt_offset_ptr                             104     112      +8
> >           fdt_get_string                             288     268     -20
> >         spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180)
> >           function                                   old     new   delta
> >           fdt_get_name                                68     128     +60
> >           fdt_get_mem_rsv                             52      96     +44
> >           fdt_subnode_offset_namelen                 200     232     +32
> >           fdt_next_tag                               192     212     +20
> >           fdt_path_offset_namelen                    240     256     +16
> >           fdt_supernode_atdepth_offset               160     172     +12
> >           fdt_ro_probe_                                -      12     +12
> >           fdt_node_offset_by_phandle                 120     132     +12
> >           fdt_splice_                                148     156      +8
> >           fdt_offset_ptr                              52      56      +4
> >           fdt_check_prop_offset_                      64      44     -20
> >           fdt_check_node_offset_                      64      44     -20
> > 
> > Of those, I'm not 100% sure.  That might well end up being Simon's
> > suggestion of a few places needing a check they don't have, I'm not
> > sure.
> 
> Maybe.  Really need to know which dtc/libfdt revision makes the
> changes to decipher that.

Yes, there's a few suggestive dtc revisions that change the size of
those functions in the full gist.  I think some of the satisfy Coverity
changes had small impact, but I didn't look for fdt_get_name /
fdt_get_mem_rsv.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-22 12:32                     ` Tom Rini
@ 2020-10-22 14:58                       ` David Gibson
       [not found]                         ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-22 14:58 UTC (permalink / raw)
  To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 14252 bytes --]

On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > > > > Hi Tom,
> > > > > > > 
> > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > > >
> > > > > > > > Hey all,
> > > > > > > >
> > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > > > > >           function                                   old     new   delta
> > > > > > > >           do_fdt                                    3992    4308    +316
> > > > > > > >           fdt_check_header                           276     492    +216
> > > > > > > >           fdt_add_property_                          388     556    +168
> > > > > > > >           fdt_ro_probe_                              128     252    +124
> > > > > > > >           fdt_packblocks_                            176     296    +120
> > > > > > > >           fdt_open_into                              392     512    +120
> > > > > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > > > > >           fdt_offset_ptr                             104     208    +104
> > > > > > > >           fdt_get_string                             288     388    +100
> > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > >           fdt_valid                                  204     276     +72
> > > > > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > >           fdt_rw_probe_                              108     156     +48
> > > > > > > >           fdt_mem_rsv                                 60     108     +48
> > > > > > > >           fdt_getprop_namelen                        100     148     +48
> > > > > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > > > > >           fdt_get_name                               164     212     +48
> > > > > > > >           efi_install_fdt                            964    1012     +48
> > > > > > > >           boot_get_fdt                               888     936     +48
> > > > > > > >           fdt_move                                    80     116     +36
> > > > > > > >           reserve_fdt                                 72      96     +24
> > > > > > > >           reloc_fdt                                   76     100     +24
> > > > > > > >           image_setup_libfdt                         284     308     +24
> > > > > > > >           fit_image_load                            1608    1632     +24
> > > > > > > >           fit_image_get_data_and_size                172     196     +24
> > > > > > > >           fit_get_end                                 16      40     +24
> > > > > > > >           fdt_next_tag                               256     280     +24
> > > > > > > >           fdt_header_size                             12      36     +24
> > > > > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > > > > >           fdt_get_property_namelen                    44      68     +24
> > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > >           fdt_del_node                                96     120     +24
> > > > > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > > > > >           common_diskboot                            680     704     +24
> > > > > > > >           fdt_next_subnode                            80      88      +8
> > > > > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > > > > >           function                                   old     new   delta
> > > > > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > > > > >           fdt_add_property_                          340     484    +144
> > > > > > > >           fdt_get_name                                68     152     +84
> > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > >           fdt_offset_ptr                              52     104     +52
> > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > > > > >           fdt_ro_probe_                                -      36     +36
> > > > > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > > > > >           spl_load_simple_fit                        848     872     +24
> > > > > > > >           spl_fit_append_fdt                         196     220     +24
> > > > > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > > > > >           fdt_get_string                              64      88     +24
> > > > > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > > > > >           fdt_check_header                            28      52     +24
> > > > > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > > > > >           fdt_next_tag                               192     212     +20
> > > > > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > > > > >           fdt_mem_rsv                                 20       -     -20
> > > > > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > > > > >           fdt_check_node_offset_                      64      44     -20
> > > > > > > >
> > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > > > > 
> > > > > > > +David Gibson
> > > > > > > 
> > > > > > > I suspect there are more checks that need to be made conditional.
> > > > > > 
> > > > > > Seems likely.
> > > > > 
> > > > > OK.  Does that mean you're going to take a look?
> > > > 
> > > > No, sorry, my time for dtc is very limited.
> > > 
> > > OK, fair point.
> > > 
> > > > > > Though, as I've opined before, from what I understand the SPL is *so*
> > > > > > restricted an environment, I'm not really convinced DTs are the right
> > > > > > tool for the job there.
> > > > > 
> > > > > SPL is space constrained, which is why we mask out all of the safety
> > > > > checks.  But we still generally have enough memory that this is fine.
> > > > > This specific board has 256KiB for SPL, for example.  But I'm not just
> > > > > concerned about 1KiB of growth when everything is masked off, I'm also
> > > > > concerned about 2KiB of growth over a changelog that doesn't read like
> > > > > it added a bunch of stuff that should cause everything to grow,
> > > > > either.
> > > > 
> > > > Yeah, that's a good point.  It's certainly not obvious to me what
> > > > would have caused the growth.  I think we'll need a revision by
> > > > revision test to see where it was added.
> > > 
> > > So, with a bit of playing around, I've used buildman to give us that,
> > > for the leez-rk3399 platform.  The full results are at
> > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
> > > that build #6, "scripts/dtc: Update to upstream version
> > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
> > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).
> > 
> > Right.. I was meaning stepping through the dtc upstream revisions, not
> > just the uboot revisions, so you get a fine grained idea of where the
> > increase is coming from.
> 
> Yes, that's what that link is.  I imported every revision from point A
> to point B, and built U-Boot for it, and checked the sizes, since we
> have tooling for that.  It's possible the bloat-o-meter script in the
> kernel could be used, but I'm not sure what the right target(s) in dtc
> would be, to get any useful information out.

Ok.. ok.  I might need some guidance to interpret the information
there on that link.

> > > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > 11738cf01f15 reduces it just a little.
> > 
> > Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > instead get intermittent bug reports where it just crashes.
> 
> We really need to talk about that then.  There was a problem of people
> turning off the sanity check for making sure the entire device tree was
> aligned and then having everything crash.

Ok... I'm not really sure where you're going with that thought.

> > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > break for either an unaligned dtb (unlikely) or if you attempt to load
> > an unaligned value from a property (more likely, but don't add the
> > flag if you're not sure you don't need it).
> 
> So long as it's abstracted in such a way that we don't grow the size of
> everything again, yes, that is the right way forward I think.

All the ASSUME flags should be resolved at compile time (at least with
normal optimization levels enabled in the compiler), so testing for
those shouldn't increase size at all.  If they do, something is wrong.

> These
> tests still introduce a big boot-time slowdown as well.

I'm not sure exactly what tests you mean.


> 
> > > With a re-revert again locally, there's just a few size
> > > growths now to look in to on the SPL side:
> > >      leez-rk3399    : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8
> > >         u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8)
> > >           function                                   old     new   delta
> > >           fdt_move                                    80      92     +12
> > >           fdt_splice_                                148     156      +8
> > >           fdt_offset_ptr                             104     112      +8
> > >           fdt_get_string                             288     268     -20
> > >         spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180)
> > >           function                                   old     new   delta
> > >           fdt_get_name                                68     128     +60
> > >           fdt_get_mem_rsv                             52      96     +44
> > >           fdt_subnode_offset_namelen                 200     232     +32
> > >           fdt_next_tag                               192     212     +20
> > >           fdt_path_offset_namelen                    240     256     +16
> > >           fdt_supernode_atdepth_offset               160     172     +12
> > >           fdt_ro_probe_                                -      12     +12
> > >           fdt_node_offset_by_phandle                 120     132     +12
> > >           fdt_splice_                                148     156      +8
> > >           fdt_offset_ptr                              52      56      +4
> > >           fdt_check_prop_offset_                      64      44     -20
> > >           fdt_check_node_offset_                      64      44     -20
> > > 
> > > Of those, I'm not 100% sure.  That might well end up being Simon's
> > > suggestion of a few places needing a check they don't have, I'm not
> > > sure.
> > 
> > Maybe.  Really need to know which dtc/libfdt revision makes the
> > changes to decipher that.
> 
> Yes, there's a few suggestive dtc revisions that change the size of
> those functions in the full gist.  I think some of the satisfy Coverity
> changes had small impact, but I didn't look for fdt_get_name /
> fdt_get_mem_rsv.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                         ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-22 15:22                           ` Tom Rini
  2020-10-26 21:51                             ` Rob Herring
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-22 15:22 UTC (permalink / raw)
  To: David Gibson; +Cc: Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 15666 bytes --]

On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> > > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > > > > > Hi Tom,
> > > > > > > > 
> > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > > > >
> > > > > > > > > Hey all,
> > > > > > > > >
> > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > > > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > > > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > > > > > >           function                                   old     new   delta
> > > > > > > > >           do_fdt                                    3992    4308    +316
> > > > > > > > >           fdt_check_header                           276     492    +216
> > > > > > > > >           fdt_add_property_                          388     556    +168
> > > > > > > > >           fdt_ro_probe_                              128     252    +124
> > > > > > > > >           fdt_packblocks_                            176     296    +120
> > > > > > > > >           fdt_open_into                              392     512    +120
> > > > > > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > > > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > > > > > >           fdt_offset_ptr                             104     208    +104
> > > > > > > > >           fdt_get_string                             288     388    +100
> > > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > > >           fdt_valid                                  204     276     +72
> > > > > > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > > >           fdt_rw_probe_                              108     156     +48
> > > > > > > > >           fdt_mem_rsv                                 60     108     +48
> > > > > > > > >           fdt_getprop_namelen                        100     148     +48
> > > > > > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > > > > > >           fdt_get_name                               164     212     +48
> > > > > > > > >           efi_install_fdt                            964    1012     +48
> > > > > > > > >           boot_get_fdt                               888     936     +48
> > > > > > > > >           fdt_move                                    80     116     +36
> > > > > > > > >           reserve_fdt                                 72      96     +24
> > > > > > > > >           reloc_fdt                                   76     100     +24
> > > > > > > > >           image_setup_libfdt                         284     308     +24
> > > > > > > > >           fit_image_load                            1608    1632     +24
> > > > > > > > >           fit_image_get_data_and_size                172     196     +24
> > > > > > > > >           fit_get_end                                 16      40     +24
> > > > > > > > >           fdt_next_tag                               256     280     +24
> > > > > > > > >           fdt_header_size                             12      36     +24
> > > > > > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > > > > > >           fdt_get_property_namelen                    44      68     +24
> > > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > > >           fdt_del_node                                96     120     +24
> > > > > > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > > > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > > > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > > > > > >           common_diskboot                            680     704     +24
> > > > > > > > >           fdt_next_subnode                            80      88      +8
> > > > > > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > > > > > >           function                                   old     new   delta
> > > > > > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > > > > > >           fdt_add_property_                          340     484    +144
> > > > > > > > >           fdt_get_name                                68     152     +84
> > > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > > >           fdt_offset_ptr                              52     104     +52
> > > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > > > > > >           fdt_ro_probe_                                -      36     +36
> > > > > > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > > > > > >           spl_load_simple_fit                        848     872     +24
> > > > > > > > >           spl_fit_append_fdt                         196     220     +24
> > > > > > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > > > > > >           fdt_get_string                              64      88     +24
> > > > > > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > > > > > >           fdt_check_header                            28      52     +24
> > > > > > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > > > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > > > > > >           fdt_next_tag                               192     212     +20
> > > > > > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > > > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > > > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > > > > > >           fdt_mem_rsv                                 20       -     -20
> > > > > > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > > > > > >           fdt_check_node_offset_                      64      44     -20
> > > > > > > > >
> > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > > > > > 
> > > > > > > > +David Gibson
> > > > > > > > 
> > > > > > > > I suspect there are more checks that need to be made conditional.
> > > > > > > 
> > > > > > > Seems likely.
> > > > > > 
> > > > > > OK.  Does that mean you're going to take a look?
> > > > > 
> > > > > No, sorry, my time for dtc is very limited.
> > > > 
> > > > OK, fair point.
> > > > 
> > > > > > > Though, as I've opined before, from what I understand the SPL is *so*
> > > > > > > restricted an environment, I'm not really convinced DTs are the right
> > > > > > > tool for the job there.
> > > > > > 
> > > > > > SPL is space constrained, which is why we mask out all of the safety
> > > > > > checks.  But we still generally have enough memory that this is fine.
> > > > > > This specific board has 256KiB for SPL, for example.  But I'm not just
> > > > > > concerned about 1KiB of growth when everything is masked off, I'm also
> > > > > > concerned about 2KiB of growth over a changelog that doesn't read like
> > > > > > it added a bunch of stuff that should cause everything to grow,
> > > > > > either.
> > > > > 
> > > > > Yeah, that's a good point.  It's certainly not obvious to me what
> > > > > would have caused the growth.  I think we'll need a revision by
> > > > > revision test to see where it was added.
> > > > 
> > > > So, with a bit of playing around, I've used buildman to give us that,
> > > > for the leez-rk3399 platform.  The full results are at
> > > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
> > > > that build #6, "scripts/dtc: Update to upstream version
> > > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
> > > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).
> > > 
> > > Right.. I was meaning stepping through the dtc upstream revisions, not
> > > just the uboot revisions, so you get a fine grained idea of where the
> > > increase is coming from.
> > 
> > Yes, that's what that link is.  I imported every revision from point A
> > to point B, and built U-Boot for it, and checked the sizes, since we
> > have tooling for that.  It's possible the bloat-o-meter script in the
> > kernel could be used, but I'm not sure what the right target(s) in dtc
> > would be, to get any useful information out.
> 
> Ok.. ok.  I might need some guidance to interpret the information
> there on that link.

You can ignore everything before line 78.  If we look at lines 86-100
there is:
09: scripts/dtc: Update to upstream version v1.4.6-25-g04b5b4062ccd
   aarch64: (for 1/1 boards) all +4.0 spl/u-boot-spl:all +4.0 spl/u-boot-spl:text +4.0 text +4.0
            leez-rk3399    : all +4 spl/u-boot-spl:all +4 spl/u-boot-spl:text +4 text +4
               u-boot: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4)
                 function                                   old     new   delta
                 fdt_ro_probe_                                -     116    +116
                 fdt_rw_probe_                                -     108    +108
                 fdt_rw_check_header_                       108       -    -108
                 fdt_check_header                           116       4    -112
               spl-u-boot-spl: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4)
                 function                                   old     new   delta
                 fdt_ro_probe_                                -     116    +116
                 fdt_rw_probe_                                -     108    +108
                 fdt_rw_check_header_                       108       -    -108
                 fdt_check_header                           116       4    -112

Which means that with dtc commit 04b5b4062ccd, we got the above function
size (text) changes.  Overall it's a 4 byte growth for "libfdt: Clean up
header checking functions" which at the end of the day, yes, that's a
fine and reasonable change and nothing with my U-Boot hat on I'm
concerned about.

Where it gets complicated is that it's not until line 722 where we get
to the end of where Simon's ASSUME_... changes were fully merged.  So
it's a bit more detective work to see where for example fdt_get_mem_rsv
was grown, but then not shrunk back with the ASSUME flags.  A quick
search shows there's 5 dtc commits that change fdt_get_mem_rsv.  My hope
is that it would be quicker for you or someone else that works on dtc a
lot to mentally swap those commits in and see "Oh, I guess we could
ASSUME_... over in this hunk" or "Oh, maybe we can rewrite this change
to be a bit more compact" easier than I :)

> > > > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > 11738cf01f15 reduces it just a little.
> > > 
> > > Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > instead get intermittent bug reports where it just crashes.
> > 
> > We really need to talk about that then.  There was a problem of people
> > turning off the sanity check for making sure the entire device tree was
> > aligned and then having everything crash.
> 
> Ok... I'm not really sure where you're going with that thought.

In my reading of the mailing list history of how this issue came up,
it was someone was booting a dragonboard or something, and they (or
rather, the board maintainer set by default) the flag to use the device
tree wherever it is in memory and NOT to relocate it to a properly
aligned address.  This in turn lead to the kernel getting an unaligned
device tree and everything crashing.  The "I know what I'm doing" flag
was set, violated the documented requirements for device trees need to
reside in memory and everything blew up.

After that it was noticed that there could be some internal
mis-alignment and if you tried those accesses on a CPU that doesn't
support doing those reads easily there could be problems, but that's not
a common at all case (as noted by it not having been seen in practice).

> > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > break for either an unaligned dtb (unlikely) or if you attempt to load
> > > an unaligned value from a property (more likely, but don't add the
> > > flag if you're not sure you don't need it).
> > 
> > So long as it's abstracted in such a way that we don't grow the size of
> > everything again, yes, that is the right way forward I think.
> 
> All the ASSUME flags should be resolved at compile time (at least with
> normal optimization levels enabled in the compiler), so testing for
> those shouldn't increase size at all.  If they do, something is wrong.

I'm saying that how ever this new ASSUME flag is done, it needs to be
done in such a way the compiler really will be smart about it.  So
something like making a new function that does fdt64_ld() if we aren't
ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
ASSUME_ALIGNED_ACCESS.

> > These
> > tests still introduce a big boot-time slowdown as well.
> 
> I'm not sure exactly what tests you mean.

Sorry, s/tests/changes/.  I asked Patrice to re-run the boot time tests
on a resync to exactly what's in upstream dtc and he still sees the very
noticeable slowdown.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-22 15:22                           ` Tom Rini
@ 2020-10-26 21:51                             ` Rob Herring
       [not found]                               ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rob Herring @ 2020-10-26 21:51 UTC (permalink / raw)
  To: Tom Rini; +Cc: David Gibson, Simon Glass, Devicetree Compiler

On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:

[...]

> > > > > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > 11738cf01f15 reduces it just a little.
> > > >
> > > > Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > instead get intermittent bug reports where it just crashes.
> > >
> > > We really need to talk about that then.  There was a problem of people
> > > turning off the sanity check for making sure the entire device tree was
> > > aligned and then having everything crash.
> >
> > Ok... I'm not really sure where you're going with that thought.
>
> In my reading of the mailing list history of how this issue came up,
> it was someone was booting a dragonboard or something, and they (or
> rather, the board maintainer set by default) the flag to use the device
> tree wherever it is in memory and NOT to relocate it to a properly
> aligned address.  This in turn lead to the kernel getting an unaligned
> device tree and everything crashing.  The "I know what I'm doing" flag
> was set, violated the documented requirements for device trees need to
> reside in memory and everything blew up.
>
> After that it was noticed that there could be some internal
> mis-alignment and if you tried those accesses on a CPU that doesn't
> support doing those reads easily there could be problems, but that's not
> a common at all case (as noted by it not having been seen in practice).

Nor a problem on many environments to begin with. More below...

> > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > an unaligned value from a property (more likely, but don't add the
> > > > flag if you're not sure you don't need it).
> > >
> > > So long as it's abstracted in such a way that we don't grow the size of
> > > everything again, yes, that is the right way forward I think.
> >
> > All the ASSUME flags should be resolved at compile time (at least with
> > normal optimization levels enabled in the compiler), so testing for
> > those shouldn't increase size at all.  If they do, something is wrong.
>
> I'm saying that how ever this new ASSUME flag is done, it needs to be
> done in such a way the compiler really will be smart about it.  So
> something like making a new function that does fdt64_ld() if we aren't
> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> ASSUME_ALIGNED_ACCESS.

Ah, unaligned accesses again... To summarize, both performance and
size suffer with not doing unaligned accesses.

Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
do unaligned accesses? That would be more aligned with what the system
can support rather than sanity checking associated with ASSUME_*.

To repeat from last time, everything ARMv6 and up can do unaligned
accesses if enabled. That's the vast majority of Arm systems that
people care about. Whether that's enabled or not is up to how SCTLR.A
is configured. Last I checked, u-boot clears this. Don't know about
SPL case though.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                               ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-26 22:17                                 ` Tom Rini
  2020-10-27 15:57                                 ` André Przywara
  1 sibling, 0 replies; 28+ messages in thread
From: Tom Rini @ 2020-10-26 22:17 UTC (permalink / raw)
  To: Rob Herring; +Cc: David Gibson, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 4266 bytes --]

On Mon, Oct 26, 2020 at 04:51:20PM -0500, Rob Herring wrote:
> On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> 
> [...]
> 
> > > > > > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > 11738cf01f15 reduces it just a little.
> > > > >
> > > > > Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > instead get intermittent bug reports where it just crashes.
> > > >
> > > > We really need to talk about that then.  There was a problem of people
> > > > turning off the sanity check for making sure the entire device tree was
> > > > aligned and then having everything crash.
> > >
> > > Ok... I'm not really sure where you're going with that thought.
> >
> > In my reading of the mailing list history of how this issue came up,
> > it was someone was booting a dragonboard or something, and they (or
> > rather, the board maintainer set by default) the flag to use the device
> > tree wherever it is in memory and NOT to relocate it to a properly
> > aligned address.  This in turn lead to the kernel getting an unaligned
> > device tree and everything crashing.  The "I know what I'm doing" flag
> > was set, violated the documented requirements for device trees need to
> > reside in memory and everything blew up.
> >
> > After that it was noticed that there could be some internal
> > mis-alignment and if you tried those accesses on a CPU that doesn't
> > support doing those reads easily there could be problems, but that's not
> > a common at all case (as noted by it not having been seen in practice).
> 
> Nor a problem on many environments to begin with. More below...
> 
> > > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > an unaligned value from a property (more likely, but don't add the
> > > > > flag if you're not sure you don't need it).
> > > >
> > > > So long as it's abstracted in such a way that we don't grow the size of
> > > > everything again, yes, that is the right way forward I think.
> > >
> > > All the ASSUME flags should be resolved at compile time (at least with
> > > normal optimization levels enabled in the compiler), so testing for
> > > those shouldn't increase size at all.  If they do, something is wrong.
> >
> > I'm saying that how ever this new ASSUME flag is done, it needs to be
> > done in such a way the compiler really will be smart about it.  So
> > something like making a new function that does fdt64_ld() if we aren't
> > ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > ASSUME_ALIGNED_ACCESS.
> 
> Ah, unaligned accesses again... To summarize, both performance and
> size suffer with not doing unaligned accesses.
> 
> Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> do unaligned accesses? That would be more aligned with what the system
> can support rather than sanity checking associated with ASSUME_*.

I leave this up to the dtc folks.  With my U-Boot hat on, we'll be
defaulting to "unaligned is fine" for everyone.

> To repeat from last time, everything ARMv6 and up can do unaligned
> accesses if enabled. That's the vast majority of Arm systems that
> people care about. Whether that's enabled or not is up to how SCTLR.A
> is configured. Last I checked, u-boot clears this. Don't know about
> SPL case though.

I think it's something close to never do we ever have things configured
such that truly unaligned access is a problem.  The historical problem
wasn't even that something wasn't at all aligned, it was 4-byte aligned
when the requirement was 8-byte aligned and everything failed.  Only
later were totally unaligned parts of the tree found and that's still
not a problem for how U-Boot uses things.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                               ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-10-26 22:17                                 ` Tom Rini
@ 2020-10-27 15:57                                 ` André Przywara
       [not found]                                   ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: André Przywara @ 2020-10-27 15:57 UTC (permalink / raw)
  To: Rob Herring, Tom Rini; +Cc: David Gibson, Simon Glass, Devicetree Compiler

On 26/10/2020 21:51, Rob Herring wrote:
> On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
>>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
>>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
>>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> 
> [...]
> 
>>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
>>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
>>>>>> 11738cf01f15 reduces it just a little.
>>>>>
>>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
>>>>> instead get intermittent bug reports where it just crashes.
>>>>
>>>> We really need to talk about that then.  There was a problem of people
>>>> turning off the sanity check for making sure the entire device tree was
>>>> aligned and then having everything crash.
>>>
>>> Ok... I'm not really sure where you're going with that thought.
>>
>> In my reading of the mailing list history of how this issue came up,
>> it was someone was booting a dragonboard or something, and they (or
>> rather, the board maintainer set by default) the flag to use the device
>> tree wherever it is in memory and NOT to relocate it to a properly
>> aligned address.  This in turn lead to the kernel getting an unaligned
>> device tree and everything crashing.  The "I know what I'm doing" flag
>> was set, violated the documented requirements for device trees need to
>> reside in memory and everything blew up.
>>
>> After that it was noticed that there could be some internal
>> mis-alignment and if you tried those accesses on a CPU that doesn't
>> support doing those reads easily there could be problems, but that's not
>> a common at all case (as noted by it not having been seen in practice).
> 
> Nor a problem on many environments to begin with. More below...
> 
>>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
>>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
>>>>> an unaligned value from a property (more likely, but don't add the
>>>>> flag if you're not sure you don't need it).
>>>>
>>>> So long as it's abstracted in such a way that we don't grow the size of
>>>> everything again, yes, that is the right way forward I think.
>>>
>>> All the ASSUME flags should be resolved at compile time (at least with
>>> normal optimization levels enabled in the compiler), so testing for
>>> those shouldn't increase size at all.  If they do, something is wrong.
>>
>> I'm saying that how ever this new ASSUME flag is done, it needs to be
>> done in such a way the compiler really will be smart about it.  So
>> something like making a new function that does fdt64_ld() if we aren't
>> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
>> ASSUME_ALIGNED_ACCESS.
> 
> Ah, unaligned accesses again... To summarize, both performance and
> size suffer with not doing unaligned accesses.
> 
> Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> do unaligned accesses? That would be more aligned with what the system
> can support rather than sanity checking associated with ASSUME_*.
> 
> To repeat from last time, everything ARMv6 and up can do unaligned
> accesses if enabled. 

But that requires the MMU to be enabled, doesn't it? If I read the ARM
ARM correctly, unaligned accesses always trap on device memory,
regardless of SCTLR.A. And without the MMU enabled everything is device
memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
cope with that, and that most likely affects libfdt as well?

Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
all the time, and I know of at least the sunxi-aarch64 SPL running with
the MMU off as well.

Cheers,
Andre

> people care about. Whether that's enabled or not is up to how SCTLR.A
> is configured. Last I checked, u-boot clears this. Don't know about
> SPL case though.
> 
> Rob
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                   ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org>
@ 2020-10-27 19:55                                     ` Rob Herring
       [not found]                                       ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rob Herring @ 2020-10-27 19:55 UTC (permalink / raw)
  To: André Przywara
  Cc: Tom Rini, David Gibson, Simon Glass, Devicetree Compiler

On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-AbSShOkvfpQ@public.gmane.orgm> wrote:
>
> On 26/10/2020 21:51, Rob Herring wrote:
> > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> >
> > [...]
> >
> >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> >>>>>> 11738cf01f15 reduces it just a little.
> >>>>>
> >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> >>>>> instead get intermittent bug reports where it just crashes.
> >>>>
> >>>> We really need to talk about that then.  There was a problem of people
> >>>> turning off the sanity check for making sure the entire device tree was
> >>>> aligned and then having everything crash.
> >>>
> >>> Ok... I'm not really sure where you're going with that thought.
> >>
> >> In my reading of the mailing list history of how this issue came up,
> >> it was someone was booting a dragonboard or something, and they (or
> >> rather, the board maintainer set by default) the flag to use the device
> >> tree wherever it is in memory and NOT to relocate it to a properly
> >> aligned address.  This in turn lead to the kernel getting an unaligned
> >> device tree and everything crashing.  The "I know what I'm doing" flag
> >> was set, violated the documented requirements for device trees need to
> >> reside in memory and everything blew up.
> >>
> >> After that it was noticed that there could be some internal
> >> mis-alignment and if you tried those accesses on a CPU that doesn't
> >> support doing those reads easily there could be problems, but that's not
> >> a common at all case (as noted by it not having been seen in practice).
> >
> > Nor a problem on many environments to begin with. More below...
> >
> >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> >>>>> an unaligned value from a property (more likely, but don't add the
> >>>>> flag if you're not sure you don't need it).
> >>>>
> >>>> So long as it's abstracted in such a way that we don't grow the size of
> >>>> everything again, yes, that is the right way forward I think.
> >>>
> >>> All the ASSUME flags should be resolved at compile time (at least with
> >>> normal optimization levels enabled in the compiler), so testing for
> >>> those shouldn't increase size at all.  If they do, something is wrong.
> >>
> >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> >> done in such a way the compiler really will be smart about it.  So
> >> something like making a new function that does fdt64_ld() if we aren't
> >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> >> ASSUME_ALIGNED_ACCESS.
> >
> > Ah, unaligned accesses again... To summarize, both performance and
> > size suffer with not doing unaligned accesses.
> >
> > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > do unaligned accesses? That would be more aligned with what the system
> > can support rather than sanity checking associated with ASSUME_*.
> >
> > To repeat from last time, everything ARMv6 and up can do unaligned
> > accesses if enabled.
>
> But that requires the MMU to be enabled, doesn't it? If I read the ARM
> ARM correctly, unaligned accesses always trap on device memory,
> regardless of SCTLR.A. And without the MMU enabled everything is device
> memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> cope with that, and that most likely affects libfdt as well?

Ah yes, I think you are right.

In that case, seems like we should figure out whether (internal)
unaligned accesses are possible with dtc generated dtbs at least
rather than just "not a common at all case (as noted by it not having
been seen in practice)." I'm sure David will point out that not all
dtbs come from dtc, but all the ones u-boot deals with do in reality.

> Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> all the time, and I know of at least the sunxi-aarch64 SPL running with
> the MMU off as well.

I'm making a mental note of this for the next time performance issues come up.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                       ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-28  4:26                                         ` David Gibson
       [not found]                                           ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-28  4:26 UTC (permalink / raw)
  To: Rob Herring
  Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 8311 bytes --]

On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> >
> > On 26/10/2020 21:51, Rob Herring wrote:
> > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > >
> > > [...]
> > >
> > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > >>>>>> 11738cf01f15 reduces it just a little.
> > >>>>>
> > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > >>>>> instead get intermittent bug reports where it just crashes.
> > >>>>
> > >>>> We really need to talk about that then.  There was a problem of people
> > >>>> turning off the sanity check for making sure the entire device tree was
> > >>>> aligned and then having everything crash.
> > >>>
> > >>> Ok... I'm not really sure where you're going with that thought.
> > >>
> > >> In my reading of the mailing list history of how this issue came up,
> > >> it was someone was booting a dragonboard or something, and they (or
> > >> rather, the board maintainer set by default) the flag to use the device
> > >> tree wherever it is in memory and NOT to relocate it to a properly
> > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > >> was set, violated the documented requirements for device trees need to
> > >> reside in memory and everything blew up.
> > >>
> > >> After that it was noticed that there could be some internal
> > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > >> support doing those reads easily there could be problems, but that's not
> > >> a common at all case (as noted by it not having been seen in practice).
> > >
> > > Nor a problem on many environments to begin with. More below...
> > >
> > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > >>>>> an unaligned value from a property (more likely, but don't add the
> > >>>>> flag if you're not sure you don't need it).
> > >>>>
> > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > >>>> everything again, yes, that is the right way forward I think.
> > >>>
> > >>> All the ASSUME flags should be resolved at compile time (at least with
> > >>> normal optimization levels enabled in the compiler), so testing for
> > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > >>
> > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > >> done in such a way the compiler really will be smart about it.  So
> > >> something like making a new function that does fdt64_ld() if we aren't
> > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > >> ASSUME_ALIGNED_ACCESS.
> > >
> > > Ah, unaligned accesses again... To summarize, both performance and
> > > size suffer with not doing unaligned accesses.
> > >
> > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > do unaligned accesses? That would be more aligned with what the system
> > > can support rather than sanity checking associated with ASSUME_*.

So, there are kind of two things here, (1) is "my platform can handle
unaligned accesses" and (2) is "assume I don't need unaligned
accesses".  We can use the fast & small versions of fdt32_ld() etc. if
either is true.  However we need to consider those separately, because
they can be independently true (or not) for different reasons.  (1)
depends on the hardware, whereas (2) depends on how you're using dtc,
and, see below, you may need at least unaligned-handling fdt64_ld() in
more cases than you think.

> > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > accesses if enabled.
> >
> > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > ARM correctly, unaligned accesses always trap on device memory,
> > regardless of SCTLR.A. And without the MMU enabled everything is device
> > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > cope with that, and that most likely affects libfdt as well?
> 
> Ah yes, I think you are right.
> 
> In that case, seems like we should figure out whether (internal)
> unaligned accesses are possible with dtc generated dtbs at least
> rather than just "not a common at all case (as noted by it not having
> been seen in practice)." I'm sure David will point out that not all
> dtbs come from dtc, but all the ones u-boot deals with do in
> reality.

Assuming the blob itself is 8-byte aligned in memory, then all
structural elements (i.e. the tree metadata) of a compliant dtb will
be naturally aligned.  The spec requires 8-byte alignment of the mem
reserve block w.r.t. the base of the blob and 4 byte aligned structure
block w.r.t. the base of the blob.  Likewise the layout of the mem
reserve block will preserve 8-byte alignment of all the 64-bit values
it contains, assuming the block itself starts 8-byte aligned.
Similarly the structure blob will preserve 4-byte alignment of all its
tags and other structural data (this amounts to requiring an alignment
gap after node names and property values).

However, "all structural elements" does not include values within
property values themselves.  Assuming propery alignment of the blocks
and the blob itself, then all property values will *begin* 4 byte
aligned.  However that leaves two relevant cases:

 a) 64-bit property values may be 4-byte aligned but not 8-byte
    aligned
 b) complex property values including both strings and integers
    typically use a packed representation with no alignment gaps.
    Such property structures are usually avoided in modern bindings,
    but they definitely exist in a bunch of older bindings.  Obviously
    that means that integer values sitting after arbitrary length
    strings may not have any natural alignment

So acccesses made by libfdt internally should be safe(*) assuming the
blob itself is loaded 8-byte aligned, and the dtb is compliant.
However the libfdt user may hit both problems (a) and (b) getting
things they actually want from the tree.  fdt{32,64}_{ld,st}() are
intended to handle those cases, so that they're useful for the caller
to pull things from properties as well as for libfdt internal
accesses.

(*) There are a number of other functions that looked like they might
    be dangerous for case (a) because they are based on 64-bit
    property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
    fdt_setprop_u64(), fdt_appendprop_u64() and
    fdt_appendprop_addrrange().  However I think they're actually
    ok, because the way they're built in terms of other functions
    means there's implicitly a memcpy() from a byte buffer.

> > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > the MMU off as well.
> 
> I'm making a mental note of this for the next time performance issues come up.

Right, running early with MMU off is definitely a real use case for
libfdt.  For similar reasons we can't assume we have an OS which will
trap and handle unaligned accesses, which we might for a more
conventional userspace library.

This kind of underscores why I'm a bit hesitant to introduce "my
platform handles unaligned acccesses" flag.  Not only does it require
detailed knowledge of the target CPU, but it can also depend on
exactly what mode that hardware is in.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                           ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-28 12:05                                             ` Tom Rini
  2020-10-29  2:55                                               ` David Gibson
  2020-10-28 17:49                                             ` Rob Herring
  1 sibling, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-28 12:05 UTC (permalink / raw)
  To: David Gibson
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 8589 bytes --]

On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > >
> > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > >
> > > > [...]
> > > >
> > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > >>>>>> 11738cf01f15 reduces it just a little.
> > > >>>>>
> > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > >>>>> instead get intermittent bug reports where it just crashes.
> > > >>>>
> > > >>>> We really need to talk about that then.  There was a problem of people
> > > >>>> turning off the sanity check for making sure the entire device tree was
> > > >>>> aligned and then having everything crash.
> > > >>>
> > > >>> Ok... I'm not really sure where you're going with that thought.
> > > >>
> > > >> In my reading of the mailing list history of how this issue came up,
> > > >> it was someone was booting a dragonboard or something, and they (or
> > > >> rather, the board maintainer set by default) the flag to use the device
> > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > >> was set, violated the documented requirements for device trees need to
> > > >> reside in memory and everything blew up.
> > > >>
> > > >> After that it was noticed that there could be some internal
> > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > >> support doing those reads easily there could be problems, but that's not
> > > >> a common at all case (as noted by it not having been seen in practice).
> > > >
> > > > Nor a problem on many environments to begin with. More below...
> > > >
> > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > >>>>> flag if you're not sure you don't need it).
> > > >>>>
> > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > >>>> everything again, yes, that is the right way forward I think.
> > > >>>
> > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > >>> normal optimization levels enabled in the compiler), so testing for
> > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > >>
> > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > >> done in such a way the compiler really will be smart about it.  So
> > > >> something like making a new function that does fdt64_ld() if we aren't
> > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > >> ASSUME_ALIGNED_ACCESS.
> > > >
> > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > size suffer with not doing unaligned accesses.
> > > >
> > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > do unaligned accesses? That would be more aligned with what the system
> > > > can support rather than sanity checking associated with ASSUME_*.
> 
> So, there are kind of two things here, (1) is "my platform can handle
> unaligned accesses" and (2) is "assume I don't need unaligned
> accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> either is true.  However we need to consider those separately, because
> they can be independently true (or not) for different reasons.  (1)
> depends on the hardware, whereas (2) depends on how you're using dtc,
> and, see below, you may need at least unaligned-handling fdt64_ld() in
> more cases than you think.
> 
> > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > accesses if enabled.
> > >
> > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > ARM correctly, unaligned accesses always trap on device memory,
> > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > cope with that, and that most likely affects libfdt as well?
> > 
> > Ah yes, I think you are right.
> > 
> > In that case, seems like we should figure out whether (internal)
> > unaligned accesses are possible with dtc generated dtbs at least
> > rather than just "not a common at all case (as noted by it not having
> > been seen in practice)." I'm sure David will point out that not all
> > dtbs come from dtc, but all the ones u-boot deals with do in
> > reality.
> 
> Assuming the blob itself is 8-byte aligned in memory, then all
> structural elements (i.e. the tree metadata) of a compliant dtb will
> be naturally aligned.  The spec requires 8-byte alignment of the mem
> reserve block w.r.t. the base of the blob and 4 byte aligned structure
> block w.r.t. the base of the blob.  Likewise the layout of the mem
> reserve block will preserve 8-byte alignment of all the 64-bit values
> it contains, assuming the block itself starts 8-byte aligned.
> Similarly the structure blob will preserve 4-byte alignment of all its
> tags and other structural data (this amounts to requiring an alignment
> gap after node names and property values).
> 
> However, "all structural elements" does not include values within
> property values themselves.  Assuming propery alignment of the blocks
> and the blob itself, then all property values will *begin* 4 byte
> aligned.  However that leaves two relevant cases:
> 
>  a) 64-bit property values may be 4-byte aligned but not 8-byte
>     aligned
>  b) complex property values including both strings and integers
>     typically use a packed representation with no alignment gaps.
>     Such property structures are usually avoided in modern bindings,
>     but they definitely exist in a bunch of older bindings.  Obviously
>     that means that integer values sitting after arbitrary length
>     strings may not have any natural alignment
> 
> So acccesses made by libfdt internally should be safe(*) assuming the
> blob itself is loaded 8-byte aligned, and the dtb is compliant.
> However the libfdt user may hit both problems (a) and (b) getting
> things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> intended to handle those cases, so that they're useful for the caller
> to pull things from properties as well as for libfdt internal
> accesses.
> 
> (*) There are a number of other functions that looked like they might
>     be dangerous for case (a) because they are based on 64-bit
>     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
>     fdt_setprop_u64(), fdt_appendprop_u64() and
>     fdt_appendprop_addrrange().  However I think they're actually
>     ok, because the way they're built in terms of other functions
>     means there's implicitly a memcpy() from a byte buffer.
> 
> > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > the MMU off as well.
> > 
> > I'm making a mental note of this for the next time performance issues come up.
> 
> Right, running early with MMU off is definitely a real use case for
> libfdt.  For similar reasons we can't assume we have an OS which will
> trap and handle unaligned accesses, which we might for a more
> conventional userspace library.
> 
> This kind of underscores why I'm a bit hesitant to introduce "my
> platform handles unaligned acccesses" flag.  Not only does it require
> detailed knowledge of the target CPU, but it can also depend on
> exactly what mode that hardware is in.

Can you please note the existing user(s) where we have just the right
combination of factors and so everything fails?

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                           ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  2020-10-28 12:05                                             ` Tom Rini
@ 2020-10-28 17:49                                             ` Rob Herring
       [not found]                                               ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Rob Herring @ 2020-10-28 17:49 UTC (permalink / raw)
  To: David Gibson
  Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler

On Tue, Oct 27, 2020 at 11:26 PM David Gibson
<david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote:
>
> On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > >
> > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > >
> > > > [...]
> > > >
> > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > >>>>>> 11738cf01f15 reduces it just a little.
> > > >>>>>
> > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > >>>>> instead get intermittent bug reports where it just crashes.
> > > >>>>
> > > >>>> We really need to talk about that then.  There was a problem of people
> > > >>>> turning off the sanity check for making sure the entire device tree was
> > > >>>> aligned and then having everything crash.
> > > >>>
> > > >>> Ok... I'm not really sure where you're going with that thought.
> > > >>
> > > >> In my reading of the mailing list history of how this issue came up,
> > > >> it was someone was booting a dragonboard or something, and they (or
> > > >> rather, the board maintainer set by default) the flag to use the device
> > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > >> was set, violated the documented requirements for device trees need to
> > > >> reside in memory and everything blew up.
> > > >>
> > > >> After that it was noticed that there could be some internal
> > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > >> support doing those reads easily there could be problems, but that's not
> > > >> a common at all case (as noted by it not having been seen in practice).
> > > >
> > > > Nor a problem on many environments to begin with. More below...
> > > >
> > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > >>>>> flag if you're not sure you don't need it).
> > > >>>>
> > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > >>>> everything again, yes, that is the right way forward I think.
> > > >>>
> > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > >>> normal optimization levels enabled in the compiler), so testing for
> > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > >>
> > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > >> done in such a way the compiler really will be smart about it.  So
> > > >> something like making a new function that does fdt64_ld() if we aren't
> > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > >> ASSUME_ALIGNED_ACCESS.
> > > >
> > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > size suffer with not doing unaligned accesses.
> > > >
> > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > do unaligned accesses? That would be more aligned with what the system
> > > > can support rather than sanity checking associated with ASSUME_*.
>
> So, there are kind of two things here, (1) is "my platform can handle
> unaligned accesses" and (2) is "assume I don't need unaligned
> accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> either is true.  However we need to consider those separately, because
> they can be independently true (or not) for different reasons.  (1)
> depends on the hardware, whereas (2) depends on how you're using dtc,
> and, see below, you may need at least unaligned-handling fdt64_ld() in
> more cases than you think.

Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but
I read it as (1).

> > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > accesses if enabled.
> > >
> > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > ARM correctly, unaligned accesses always trap on device memory,
> > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > cope with that, and that most likely affects libfdt as well?
> >
> > Ah yes, I think you are right.
> >
> > In that case, seems like we should figure out whether (internal)
> > unaligned accesses are possible with dtc generated dtbs at least
> > rather than just "not a common at all case (as noted by it not having
> > been seen in practice)." I'm sure David will point out that not all
> > dtbs come from dtc, but all the ones u-boot deals with do in
> > reality.
>
> Assuming the blob itself is 8-byte aligned in memory, then all
> structural elements (i.e. the tree metadata) of a compliant dtb will
> be naturally aligned.  The spec requires 8-byte alignment of the mem
> reserve block w.r.t. the base of the blob and 4 byte aligned structure
> block w.r.t. the base of the blob.  Likewise the layout of the mem
> reserve block will preserve 8-byte alignment of all the 64-bit values
> it contains, assuming the block itself starts 8-byte aligned.
> Similarly the structure blob will preserve 4-byte alignment of all its
> tags and other structural data (this amounts to requiring an alignment
> gap after node names and property values).
>
> However, "all structural elements" does not include values within
> property values themselves.  Assuming propery alignment of the blocks
> and the blob itself, then all property values will *begin* 4 byte
> aligned.  However that leaves two relevant cases:
>
>  a) 64-bit property values may be 4-byte aligned but not 8-byte
>     aligned

I'd assume that while an arch may support only the above in terms of
misalignment, an arch that supports any alignment would always support
this as part of that. It would just be odd to support byte alignment
only up to 32-bit. I don't think we need to optimize the former case.

>  b) complex property values including both strings and integers
>     typically use a packed representation with no alignment gaps.
>     Such property structures are usually avoided in modern bindings,
>     but they definitely exist in a bunch of older bindings.  Obviously
>     that means that integer values sitting after arbitrary length
>     strings may not have any natural alignment

That's the user's problem IMO. Users of older bindings having this
aren't likely using a newish function like fdt32_ld either.

> So acccesses made by libfdt internally should be safe(*) assuming the
> blob itself is loaded 8-byte aligned, and the dtb is compliant.
> However the libfdt user may hit both problems (a) and (b) getting
> things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> intended to handle those cases, so that they're useful for the caller
> to pull things from properties as well as for libfdt internal
> accesses.
>
> (*) There are a number of other functions that looked like they might
>     be dangerous for case (a) because they are based on 64-bit
>     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
>     fdt_setprop_u64(), fdt_appendprop_u64() and
>     fdt_appendprop_addrrange().  However I think they're actually
>     ok, because the way they're built in terms of other functions
>     means there's implicitly a memcpy() from a byte buffer.
>
> > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > the MMU off as well.
> >
> > I'm making a mental note of this for the next time performance issues come up.
>
> Right, running early with MMU off is definitely a real use case for
> libfdt.  For similar reasons we can't assume we have an OS which will
> trap and handle unaligned accesses, which we might for a more
> conventional userspace library.
>
> This kind of underscores why I'm a bit hesitant to introduce "my
> platform handles unaligned acccesses" flag.  Not only does it require
> detailed knowledge of the target CPU, but it can also depend on
> exactly what mode that hardware is in.

I think there's a more simple solution with no flags. Given all
internal accesses are at least 4-byte aligned, libfdt should just do
32-bit accesses internally (as it used to). Maybe we need a check up
front that the dtb is 8-byte aligned though. fdt{32,64}_ld() can just
keep the same implementation. If users want it optimized, then they
can do their own accessors.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-28 12:05                                             ` Tom Rini
@ 2020-10-29  2:55                                               ` David Gibson
       [not found]                                                 ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-29  2:55 UTC (permalink / raw)
  To: Tom Rini
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 9181 bytes --]

On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > >
> > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > >>>>>
> > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > >>>>
> > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > >>>> aligned and then having everything crash.
> > > > >>>
> > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > >>
> > > > >> In my reading of the mailing list history of how this issue came up,
> > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > >> was set, violated the documented requirements for device trees need to
> > > > >> reside in memory and everything blew up.
> > > > >>
> > > > >> After that it was noticed that there could be some internal
> > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > >> support doing those reads easily there could be problems, but that's not
> > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > >
> > > > > Nor a problem on many environments to begin with. More below...
> > > > >
> > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > >>>>> flag if you're not sure you don't need it).
> > > > >>>>
> > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > >>>> everything again, yes, that is the right way forward I think.
> > > > >>>
> > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > >>
> > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > >> done in such a way the compiler really will be smart about it.  So
> > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > >> ASSUME_ALIGNED_ACCESS.
> > > > >
> > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > size suffer with not doing unaligned accesses.
> > > > >
> > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > can support rather than sanity checking associated with ASSUME_*.
> > 
> > So, there are kind of two things here, (1) is "my platform can handle
> > unaligned accesses" and (2) is "assume I don't need unaligned
> > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > either is true.  However we need to consider those separately, because
> > they can be independently true (or not) for different reasons.  (1)
> > depends on the hardware, whereas (2) depends on how you're using dtc,
> > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > more cases than you think.
> > 
> > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > accesses if enabled.
> > > >
> > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > cope with that, and that most likely affects libfdt as well?
> > > 
> > > Ah yes, I think you are right.
> > > 
> > > In that case, seems like we should figure out whether (internal)
> > > unaligned accesses are possible with dtc generated dtbs at least
> > > rather than just "not a common at all case (as noted by it not having
> > > been seen in practice)." I'm sure David will point out that not all
> > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > reality.
> > 
> > Assuming the blob itself is 8-byte aligned in memory, then all
> > structural elements (i.e. the tree metadata) of a compliant dtb will
> > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > reserve block will preserve 8-byte alignment of all the 64-bit values
> > it contains, assuming the block itself starts 8-byte aligned.
> > Similarly the structure blob will preserve 4-byte alignment of all its
> > tags and other structural data (this amounts to requiring an alignment
> > gap after node names and property values).
> > 
> > However, "all structural elements" does not include values within
> > property values themselves.  Assuming propery alignment of the blocks
> > and the blob itself, then all property values will *begin* 4 byte
> > aligned.  However that leaves two relevant cases:
> > 
> >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> >     aligned
> >  b) complex property values including both strings and integers
> >     typically use a packed representation with no alignment gaps.
> >     Such property structures are usually avoided in modern bindings,
> >     but they definitely exist in a bunch of older bindings.  Obviously
> >     that means that integer values sitting after arbitrary length
> >     strings may not have any natural alignment
> > 
> > So acccesses made by libfdt internally should be safe(*) assuming the
> > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > However the libfdt user may hit both problems (a) and (b) getting
> > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > intended to handle those cases, so that they're useful for the caller
> > to pull things from properties as well as for libfdt internal
> > accesses.
> > 
> > (*) There are a number of other functions that looked like they might
> >     be dangerous for case (a) because they are based on 64-bit
> >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> >     fdt_setprop_u64(), fdt_appendprop_u64() and
> >     fdt_appendprop_addrrange().  However I think they're actually
> >     ok, because the way they're built in terms of other functions
> >     means there's implicitly a memcpy() from a byte buffer.
> > 
> > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > the MMU off as well.
> > > 
> > > I'm making a mental note of this for the next time performance issues come up.
> > 
> > Right, running early with MMU off is definitely a real use case for
> > libfdt.  For similar reasons we can't assume we have an OS which will
> > trap and handle unaligned accesses, which we might for a more
> > conventional userspace library.
> > 
> > This kind of underscores why I'm a bit hesitant to introduce "my
> > platform handles unaligned acccesses" flag.  Not only does it require
> > detailed knowledge of the target CPU, but it can also depend on
> > exactly what mode that hardware is in.
> 
> Can you please note the existing user(s) where we have just the right
> combination of factors and so everything fails?

Sorry, I don't understand the question.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                               ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-29  3:02                                                 ` David Gibson
       [not found]                                                   ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-29  3:02 UTC (permalink / raw)
  To: Rob Herring
  Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 10768 bytes --]

On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote:
> On Tue, Oct 27, 2020 at 11:26 PM David Gibson
> <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote:
> >
> > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > >
> > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > >>>>>
> > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > >>>>
> > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > >>>> aligned and then having everything crash.
> > > > >>>
> > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > >>
> > > > >> In my reading of the mailing list history of how this issue came up,
> > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > >> was set, violated the documented requirements for device trees need to
> > > > >> reside in memory and everything blew up.
> > > > >>
> > > > >> After that it was noticed that there could be some internal
> > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > >> support doing those reads easily there could be problems, but that's not
> > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > >
> > > > > Nor a problem on many environments to begin with. More below...
> > > > >
> > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > >>>>> flag if you're not sure you don't need it).
> > > > >>>>
> > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > >>>> everything again, yes, that is the right way forward I think.
> > > > >>>
> > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > >>
> > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > >> done in such a way the compiler really will be smart about it.  So
> > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > >> ASSUME_ALIGNED_ACCESS.
> > > > >
> > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > size suffer with not doing unaligned accesses.
> > > > >
> > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > can support rather than sanity checking associated with ASSUME_*.
> >
> > So, there are kind of two things here, (1) is "my platform can handle
> > unaligned accesses" and (2) is "assume I don't need unaligned
> > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > either is true.  However we need to consider those separately, because
> > they can be independently true (or not) for different reasons.  (1)
> > depends on the hardware, whereas (2) depends on how you're using dtc,
> > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > more cases than you think.
> 
> Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but
> I read it as (1).

Yes.

> > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > accesses if enabled.
> > > >
> > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > cope with that, and that most likely affects libfdt as well?
> > >
> > > Ah yes, I think you are right.
> > >
> > > In that case, seems like we should figure out whether (internal)
> > > unaligned accesses are possible with dtc generated dtbs at least
> > > rather than just "not a common at all case (as noted by it not having
> > > been seen in practice)." I'm sure David will point out that not all
> > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > reality.
> >
> > Assuming the blob itself is 8-byte aligned in memory, then all
> > structural elements (i.e. the tree metadata) of a compliant dtb will
> > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > reserve block will preserve 8-byte alignment of all the 64-bit values
> > it contains, assuming the block itself starts 8-byte aligned.
> > Similarly the structure blob will preserve 4-byte alignment of all its
> > tags and other structural data (this amounts to requiring an alignment
> > gap after node names and property values).
> >
> > However, "all structural elements" does not include values within
> > property values themselves.  Assuming propery alignment of the blocks
> > and the blob itself, then all property values will *begin* 4 byte
> > aligned.  However that leaves two relevant cases:
> >
> >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> >     aligned
> 
> I'd assume that while an arch may support only the above in terms of
> misalignment, an arch that supports any alignment would always support
> this as part of that. It would just be odd to support byte alignment
> only up to 32-bit.

Yes, I'd expect so.

> I don't think we need to optimize the former case.

I don't see how we would, in any case.

> >  b) complex property values including both strings and integers
> >     typically use a packed representation with no alignment gaps.
> >     Such property structures are usually avoided in modern bindings,
> >     but they definitely exist in a bunch of older bindings.  Obviously
> >     that means that integer values sitting after arbitrary length
> >     strings may not have any natural alignment
> 
> That's the user's problem IMO. Users of older bindings having this
> aren't likely using a newish function like fdt32_ld either.

That doesn't follow.  The bindings still exist and are in use, e.g. on
IBM PAPR systems, that's not correlated to how recent teh libfdt is.

> > So acccesses made by libfdt internally should be safe(*) assuming the
> > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > However the libfdt user may hit both problems (a) and (b) getting
> > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > intended to handle those cases, so that they're useful for the caller
> > to pull things from properties as well as for libfdt internal
> > accesses.
> >
> > (*) There are a number of other functions that looked like they might
> >     be dangerous for case (a) because they are based on 64-bit
> >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> >     fdt_setprop_u64(), fdt_appendprop_u64() and
> >     fdt_appendprop_addrrange().  However I think they're actually
> >     ok, because the way they're built in terms of other functions
> >     means there's implicitly a memcpy() from a byte buffer.
> >
> > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > the MMU off as well.
> > >
> > > I'm making a mental note of this for the next time performance issues come up.
> >
> > Right, running early with MMU off is definitely a real use case for
> > libfdt.  For similar reasons we can't assume we have an OS which will
> > trap and handle unaligned accesses, which we might for a more
> > conventional userspace library.
> >
> > This kind of underscores why I'm a bit hesitant to introduce "my
> > platform handles unaligned acccesses" flag.  Not only does it require
> > detailed knowledge of the target CPU, but it can also depend on
> > exactly what mode that hardware is in.
> 
> I think there's a more simple solution with no flags. Given all
> internal accesses are at least 4-byte aligned, libfdt should just do
> 32-bit accesses internally (as it used to). Maybe we need a check up
> front that the dtb is 8-byte aligned though.

That's not a bad idea.  We could do it in fdt_ro_probe_().

Although, one extra case occurs to me.  Someone (is it uboot?) has a
wacky format where dtbs for several platforms, along with kernels and
other information are bundled together in a big dtb (that is, using
the dtb encoding, even though it's not actually a device tree).  The
"sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte
aligned.

> fdt{32,64}_ld() can just
> keep the same implementation.

Meaning the misalignment-safe ones?  Means we need different internal
accessors to the exported ones, but I guess that's ok.

> If users want it optimized, then they
> can do their own accessors.
> 
> Rob
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                                   ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-29 15:04                                                     ` Tom Rini
  2020-10-29 19:56                                                       ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-29 15:04 UTC (permalink / raw)
  To: David Gibson
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 10923 bytes --]

On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote:
> On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote:
> > On Tue, Oct 27, 2020 at 11:26 PM David Gibson
> > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote:
> > >
> > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > > >
> > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > >>>>>
> > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > >>>>
> > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > >>>> aligned and then having everything crash.
> > > > > >>>
> > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > >>
> > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > >> was set, violated the documented requirements for device trees need to
> > > > > >> reside in memory and everything blew up.
> > > > > >>
> > > > > >> After that it was noticed that there could be some internal
> > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > >
> > > > > > Nor a problem on many environments to begin with. More below...
> > > > > >
> > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > >>>>> flag if you're not sure you don't need it).
> > > > > >>>>
> > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > >>>
> > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > >>
> > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > >
> > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > size suffer with not doing unaligned accesses.
> > > > > >
> > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > can support rather than sanity checking associated with ASSUME_*.
> > >
> > > So, there are kind of two things here, (1) is "my platform can handle
> > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > either is true.  However we need to consider those separately, because
> > > they can be independently true (or not) for different reasons.  (1)
> > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > more cases than you think.
> > 
> > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but
> > I read it as (1).
> 
> Yes.
> 
> > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > accesses if enabled.
> > > > >
> > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > cope with that, and that most likely affects libfdt as well?
> > > >
> > > > Ah yes, I think you are right.
> > > >
> > > > In that case, seems like we should figure out whether (internal)
> > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > rather than just "not a common at all case (as noted by it not having
> > > > been seen in practice)." I'm sure David will point out that not all
> > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > reality.
> > >
> > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > it contains, assuming the block itself starts 8-byte aligned.
> > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > tags and other structural data (this amounts to requiring an alignment
> > > gap after node names and property values).
> > >
> > > However, "all structural elements" does not include values within
> > > property values themselves.  Assuming propery alignment of the blocks
> > > and the blob itself, then all property values will *begin* 4 byte
> > > aligned.  However that leaves two relevant cases:
> > >
> > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > >     aligned
> > 
> > I'd assume that while an arch may support only the above in terms of
> > misalignment, an arch that supports any alignment would always support
> > this as part of that. It would just be odd to support byte alignment
> > only up to 32-bit.
> 
> Yes, I'd expect so.
> 
> > I don't think we need to optimize the former case.
> 
> I don't see how we would, in any case.
> 
> > >  b) complex property values including both strings and integers
> > >     typically use a packed representation with no alignment gaps.
> > >     Such property structures are usually avoided in modern bindings,
> > >     but they definitely exist in a bunch of older bindings.  Obviously
> > >     that means that integer values sitting after arbitrary length
> > >     strings may not have any natural alignment
> > 
> > That's the user's problem IMO. Users of older bindings having this
> > aren't likely using a newish function like fdt32_ld either.
> 
> That doesn't follow.  The bindings still exist and are in use, e.g. on
> IBM PAPR systems, that's not correlated to how recent teh libfdt is.
> 
> > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > However the libfdt user may hit both problems (a) and (b) getting
> > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > intended to handle those cases, so that they're useful for the caller
> > > to pull things from properties as well as for libfdt internal
> > > accesses.
> > >
> > > (*) There are a number of other functions that looked like they might
> > >     be dangerous for case (a) because they are based on 64-bit
> > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > >     fdt_appendprop_addrrange().  However I think they're actually
> > >     ok, because the way they're built in terms of other functions
> > >     means there's implicitly a memcpy() from a byte buffer.
> > >
> > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > the MMU off as well.
> > > >
> > > > I'm making a mental note of this for the next time performance issues come up.
> > >
> > > Right, running early with MMU off is definitely a real use case for
> > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > trap and handle unaligned accesses, which we might for a more
> > > conventional userspace library.
> > >
> > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > platform handles unaligned acccesses" flag.  Not only does it require
> > > detailed knowledge of the target CPU, but it can also depend on
> > > exactly what mode that hardware is in.
> > 
> > I think there's a more simple solution with no flags. Given all
> > internal accesses are at least 4-byte aligned, libfdt should just do
> > 32-bit accesses internally (as it used to). Maybe we need a check up
> > front that the dtb is 8-byte aligned though.
> 
> That's not a bad idea.  We could do it in fdt_ro_probe_().
> 
> Although, one extra case occurs to me.  Someone (is it uboot?) has a
> wacky format where dtbs for several platforms, along with kernels and
> other information are bundled together in a big dtb (that is, using
> the dtb encoding, even though it's not actually a device tree).  The
> "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte
> aligned.

Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere,
really...) FIT images which are what you're thinking of.  That's
unrelated to all of this however.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                                 ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-29 15:06                                                   ` Tom Rini
  2020-10-29 15:48                                                     ` Rob Herring
  2020-11-02  2:06                                                     ` David Gibson
  0 siblings, 2 replies; 28+ messages in thread
From: Tom Rini @ 2020-10-29 15:06 UTC (permalink / raw)
  To: David Gibson
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 9662 bytes --]

On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > > >
> > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > >>>>>
> > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > >>>>
> > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > >>>> aligned and then having everything crash.
> > > > > >>>
> > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > >>
> > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > >> was set, violated the documented requirements for device trees need to
> > > > > >> reside in memory and everything blew up.
> > > > > >>
> > > > > >> After that it was noticed that there could be some internal
> > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > >
> > > > > > Nor a problem on many environments to begin with. More below...
> > > > > >
> > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > >>>>> flag if you're not sure you don't need it).
> > > > > >>>>
> > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > >>>
> > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > >>
> > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > >
> > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > size suffer with not doing unaligned accesses.
> > > > > >
> > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > 
> > > So, there are kind of two things here, (1) is "my platform can handle
> > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > either is true.  However we need to consider those separately, because
> > > they can be independently true (or not) for different reasons.  (1)
> > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > more cases than you think.
> > > 
> > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > accesses if enabled.
> > > > >
> > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > cope with that, and that most likely affects libfdt as well?
> > > > 
> > > > Ah yes, I think you are right.
> > > > 
> > > > In that case, seems like we should figure out whether (internal)
> > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > rather than just "not a common at all case (as noted by it not having
> > > > been seen in practice)." I'm sure David will point out that not all
> > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > reality.
> > > 
> > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > it contains, assuming the block itself starts 8-byte aligned.
> > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > tags and other structural data (this amounts to requiring an alignment
> > > gap after node names and property values).
> > > 
> > > However, "all structural elements" does not include values within
> > > property values themselves.  Assuming propery alignment of the blocks
> > > and the blob itself, then all property values will *begin* 4 byte
> > > aligned.  However that leaves two relevant cases:
> > > 
> > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > >     aligned
> > >  b) complex property values including both strings and integers
> > >     typically use a packed representation with no alignment gaps.
> > >     Such property structures are usually avoided in modern bindings,
> > >     but they definitely exist in a bunch of older bindings.  Obviously
> > >     that means that integer values sitting after arbitrary length
> > >     strings may not have any natural alignment
> > > 
> > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > However the libfdt user may hit both problems (a) and (b) getting
> > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > intended to handle those cases, so that they're useful for the caller
> > > to pull things from properties as well as for libfdt internal
> > > accesses.
> > > 
> > > (*) There are a number of other functions that looked like they might
> > >     be dangerous for case (a) because they are based on 64-bit
> > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > >     fdt_appendprop_addrrange().  However I think they're actually
> > >     ok, because the way they're built in terms of other functions
> > >     means there's implicitly a memcpy() from a byte buffer.
> > > 
> > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > the MMU off as well.
> > > > 
> > > > I'm making a mental note of this for the next time performance issues come up.
> > > 
> > > Right, running early with MMU off is definitely a real use case for
> > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > trap and handle unaligned accesses, which we might for a more
> > > conventional userspace library.
> > > 
> > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > platform handles unaligned acccesses" flag.  Not only does it require
> > > detailed knowledge of the target CPU, but it can also depend on
> > > exactly what mode that hardware is in.
> > 
> > Can you please note the existing user(s) where we have just the right
> > combination of factors and so everything fails?
> 
> Sorry, I don't understand the question.

I'm asking what the platform(s) are that have the very specific "and
here be failure" problem you're concerned with.  I'm concerned that
right now we're going to end up with larger pile of reverts to dtc in
U-Boot rather than being able to just sync with the project properly
again.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-29 15:06                                                   ` Tom Rini
@ 2020-10-29 15:48                                                     ` Rob Herring
       [not found]                                                       ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-11-02  2:06                                                     ` David Gibson
  1 sibling, 1 reply; 28+ messages in thread
From: Rob Herring @ 2020-10-29 15:48 UTC (permalink / raw)
  To: Tom Rini
  Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler

On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>
> On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> > > > > >
> > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > >>>>>
> > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > >>>>
> > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > >>>> aligned and then having everything crash.
> > > > > > >>>
> > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > >>
> > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > >> reside in memory and everything blew up.
> > > > > > >>
> > > > > > >> After that it was noticed that there could be some internal
> > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > >
> > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > >
> > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > >>>>
> > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > >>>
> > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > >>
> > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > >
> > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > size suffer with not doing unaligned accesses.
> > > > > > >
> > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > >
> > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > either is true.  However we need to consider those separately, because
> > > > they can be independently true (or not) for different reasons.  (1)
> > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > more cases than you think.
> > > >
> > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > accesses if enabled.
> > > > > >
> > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > cope with that, and that most likely affects libfdt as well?
> > > > >
> > > > > Ah yes, I think you are right.
> > > > >
> > > > > In that case, seems like we should figure out whether (internal)
> > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > rather than just "not a common at all case (as noted by it not having
> > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > reality.
> > > >
> > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > tags and other structural data (this amounts to requiring an alignment
> > > > gap after node names and property values).
> > > >
> > > > However, "all structural elements" does not include values within
> > > > property values themselves.  Assuming propery alignment of the blocks
> > > > and the blob itself, then all property values will *begin* 4 byte
> > > > aligned.  However that leaves two relevant cases:
> > > >
> > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > >     aligned
> > > >  b) complex property values including both strings and integers
> > > >     typically use a packed representation with no alignment gaps.
> > > >     Such property structures are usually avoided in modern bindings,
> > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > >     that means that integer values sitting after arbitrary length
> > > >     strings may not have any natural alignment
> > > >
> > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > intended to handle those cases, so that they're useful for the caller
> > > > to pull things from properties as well as for libfdt internal
> > > > accesses.
> > > >
> > > > (*) There are a number of other functions that looked like they might
> > > >     be dangerous for case (a) because they are based on 64-bit
> > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > >     ok, because the way they're built in terms of other functions
> > > >     means there's implicitly a memcpy() from a byte buffer.
> > > >
> > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > the MMU off as well.
> > > > >
> > > > > I'm making a mental note of this for the next time performance issues come up.
> > > >
> > > > Right, running early with MMU off is definitely a real use case for
> > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > trap and handle unaligned accesses, which we might for a more
> > > > conventional userspace library.
> > > >
> > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > detailed knowledge of the target CPU, but it can also depend on
> > > > exactly what mode that hardware is in.
> > >
> > > Can you please note the existing user(s) where we have just the right
> > > combination of factors and so everything fails?
> >
> > Sorry, I don't understand the question.
>
> I'm asking what the platform(s) are that have the very specific "and
> here be failure" problem you're concerned with.

I'm also and still confused with your question.

>  I'm concerned that
> right now we're going to end up with larger pile of reverts to dtc in
> U-Boot rather than being able to just sync with the project properly
> again.

I think we have some agreement which I believe would end being the
revert you originally submitted, but just keep the fdt*_ld() accessors
which always do safe accesses.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                                       ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-29 16:04                                                         ` Tom Rini
  2020-10-29 18:08                                                           ` Rob Herring
  0 siblings, 1 reply; 28+ messages in thread
From: Tom Rini @ 2020-10-29 16:04 UTC (permalink / raw)
  To: Rob Herring
  Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 11198 bytes --]

On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote:
> On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >
> > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> > > > > > >
> > > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > > >>>>>
> > > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > > >>>>
> > > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > > >>>> aligned and then having everything crash.
> > > > > > > >>>
> > > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > > >>
> > > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > > >> reside in memory and everything blew up.
> > > > > > > >>
> > > > > > > >> After that it was noticed that there could be some internal
> > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > > >
> > > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > > >
> > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > > >>>>
> > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > > >>>
> > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > > >>
> > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > > >
> > > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > > size suffer with not doing unaligned accesses.
> > > > > > > >
> > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > > >
> > > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > > either is true.  However we need to consider those separately, because
> > > > > they can be independently true (or not) for different reasons.  (1)
> > > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > > more cases than you think.
> > > > >
> > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > > accesses if enabled.
> > > > > > >
> > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > > cope with that, and that most likely affects libfdt as well?
> > > > > >
> > > > > > Ah yes, I think you are right.
> > > > > >
> > > > > > In that case, seems like we should figure out whether (internal)
> > > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > > rather than just "not a common at all case (as noted by it not having
> > > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > > reality.
> > > > >
> > > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > > tags and other structural data (this amounts to requiring an alignment
> > > > > gap after node names and property values).
> > > > >
> > > > > However, "all structural elements" does not include values within
> > > > > property values themselves.  Assuming propery alignment of the blocks
> > > > > and the blob itself, then all property values will *begin* 4 byte
> > > > > aligned.  However that leaves two relevant cases:
> > > > >
> > > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > > >     aligned
> > > > >  b) complex property values including both strings and integers
> > > > >     typically use a packed representation with no alignment gaps.
> > > > >     Such property structures are usually avoided in modern bindings,
> > > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > > >     that means that integer values sitting after arbitrary length
> > > > >     strings may not have any natural alignment
> > > > >
> > > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > > intended to handle those cases, so that they're useful for the caller
> > > > > to pull things from properties as well as for libfdt internal
> > > > > accesses.
> > > > >
> > > > > (*) There are a number of other functions that looked like they might
> > > > >     be dangerous for case (a) because they are based on 64-bit
> > > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > > >     ok, because the way they're built in terms of other functions
> > > > >     means there's implicitly a memcpy() from a byte buffer.
> > > > >
> > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > > the MMU off as well.
> > > > > >
> > > > > > I'm making a mental note of this for the next time performance issues come up.
> > > > >
> > > > > Right, running early with MMU off is definitely a real use case for
> > > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > > trap and handle unaligned accesses, which we might for a more
> > > > > conventional userspace library.
> > > > >
> > > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > > detailed knowledge of the target CPU, but it can also depend on
> > > > > exactly what mode that hardware is in.
> > > >
> > > > Can you please note the existing user(s) where we have just the right
> > > > combination of factors and so everything fails?
> > >
> > > Sorry, I don't understand the question.
> >
> > I'm asking what the platform(s) are that have the very specific "and
> > here be failure" problem you're concerned with.
> 
> I'm also and still confused with your question.

Maybe I'm just confused then.  I'm asking what hardware ends up causing
a fault when we read a property that starts at 0x....3 and we would not
have already enabled the MMU and done whatever else is required to have
the fault fixed up and handled.  That's not the case in U-Boot.  But
that being a possible case seems to be where the underlying concern is.

> >  I'm concerned that
> > right now we're going to end up with larger pile of reverts to dtc in
> > U-Boot rather than being able to just sync with the project properly
> > again.
> 
> I think we have some agreement which I believe would end being the
> revert you originally submitted, but just keep the fdt*_ld() accessors
> which always do safe accesses.

So keep fdt{32,64}_ld() and switch all of the callers back to
fdt{32,64}_to_cpu()?  OK, should I spin up a patch or ?  Thanks!

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-29 16:04                                                         ` Tom Rini
@ 2020-10-29 18:08                                                           ` Rob Herring
       [not found]                                                             ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rob Herring @ 2020-10-29 18:08 UTC (permalink / raw)
  To: Tom Rini
  Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler

On Thu, Oct 29, 2020 at 11:04 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
>
> On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote:
> > On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > >
> > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> > > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> > > > > > > >
> > > > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > >
> > > > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > > > >>>>
> > > > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > > > >>>> aligned and then having everything crash.
> > > > > > > > >>>
> > > > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > > > >>
> > > > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > > > >> reside in memory and everything blew up.
> > > > > > > > >>
> > > > > > > > >> After that it was noticed that there could be some internal
> > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > > > >
> > > > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > > > >
> > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > > > >>>>
> > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > > > >>>
> > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > > > >>
> > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > > > >
> > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > > > size suffer with not doing unaligned accesses.
> > > > > > > > >
> > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > > > >
> > > > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > > > either is true.  However we need to consider those separately, because
> > > > > > they can be independently true (or not) for different reasons.  (1)
> > > > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > > > more cases than you think.
> > > > > >
> > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > > > accesses if enabled.
> > > > > > > >
> > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > > > cope with that, and that most likely affects libfdt as well?
> > > > > > >
> > > > > > > Ah yes, I think you are right.
> > > > > > >
> > > > > > > In that case, seems like we should figure out whether (internal)
> > > > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > > > rather than just "not a common at all case (as noted by it not having
> > > > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > > > reality.
> > > > > >
> > > > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > > > tags and other structural data (this amounts to requiring an alignment
> > > > > > gap after node names and property values).
> > > > > >
> > > > > > However, "all structural elements" does not include values within
> > > > > > property values themselves.  Assuming propery alignment of the blocks
> > > > > > and the blob itself, then all property values will *begin* 4 byte
> > > > > > aligned.  However that leaves two relevant cases:
> > > > > >
> > > > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > > > >     aligned
> > > > > >  b) complex property values including both strings and integers
> > > > > >     typically use a packed representation with no alignment gaps.
> > > > > >     Such property structures are usually avoided in modern bindings,
> > > > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > > > >     that means that integer values sitting after arbitrary length
> > > > > >     strings may not have any natural alignment
> > > > > >
> > > > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > > > intended to handle those cases, so that they're useful for the caller
> > > > > > to pull things from properties as well as for libfdt internal
> > > > > > accesses.
> > > > > >
> > > > > > (*) There are a number of other functions that looked like they might
> > > > > >     be dangerous for case (a) because they are based on 64-bit
> > > > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > > > >     ok, because the way they're built in terms of other functions
> > > > > >     means there's implicitly a memcpy() from a byte buffer.
> > > > > >
> > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > > > the MMU off as well.
> > > > > > >
> > > > > > > I'm making a mental note of this for the next time performance issues come up.
> > > > > >
> > > > > > Right, running early with MMU off is definitely a real use case for
> > > > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > > > trap and handle unaligned accesses, which we might for a more
> > > > > > conventional userspace library.
> > > > > >
> > > > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > > > detailed knowledge of the target CPU, but it can also depend on
> > > > > > exactly what mode that hardware is in.
> > > > >
> > > > > Can you please note the existing user(s) where we have just the right
> > > > > combination of factors and so everything fails?
> > > >
> > > > Sorry, I don't understand the question.
> > >
> > > I'm asking what the platform(s) are that have the very specific "and
> > > here be failure" problem you're concerned with.
> >
> > I'm also and still confused with your question.
>
> Maybe I'm just confused then.  I'm asking what hardware ends up causing
> a fault when we read a property that starts at 0x....3 and we would not
> have already enabled the MMU and done whatever else is required to have
> the fault fixed up and handled.

Ah, well ARMv4 and v5 systems. Outside of that, I don't know.

But as David said, properties will start with 4 byte alignment. It's
only accesses within a property value that could be a problem, but
those would be done by the user of libfdt and probably already made
them alignment safe if needed (or they use fdt{32,64}_ld() if
relatively recently written).

> That's not the case in U-Boot.

But it is for u-boot as Andre pointed out if the MMU is off.

> But
> that being a possible case seems to be where the underlying concern is.
>
> > >  I'm concerned that
> > > right now we're going to end up with larger pile of reverts to dtc in
> > > U-Boot rather than being able to just sync with the project properly
> > > again.
> >
> > I think we have some agreement which I believe would end being the
> > revert you originally submitted, but just keep the fdt*_ld() accessors
> > which always do safe accesses.
>
> So keep fdt{32,64}_ld() and switch all of the callers back to
> fdt{32,64}_to_cpu()?

Right.

OK, should I spin up a patch or ?

Yes.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-29 15:04                                                     ` Tom Rini
@ 2020-10-29 19:56                                                       ` David Gibson
       [not found]                                                         ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2020-10-29 19:56 UTC (permalink / raw)
  To: Tom Rini
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 11714 bytes --]

On Thu, Oct 29, 2020 at 11:04:01AM -0400, Tom Rini wrote:
> On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote:
> > On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote:
> > > On Tue, Oct 27, 2020 at 11:26 PM David Gibson
> > > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote:
> > > >
> > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > > > >
> > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > >>>>>
> > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > >>>>
> > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > >>>> aligned and then having everything crash.
> > > > > > >>>
> > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > >>
> > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > >> reside in memory and everything blew up.
> > > > > > >>
> > > > > > >> After that it was noticed that there could be some internal
> > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > >
> > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > >
> > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > >>>>
> > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > >>>
> > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > >>
> > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > >
> > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > size suffer with not doing unaligned accesses.
> > > > > > >
> > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > >
> > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > either is true.  However we need to consider those separately, because
> > > > they can be independently true (or not) for different reasons.  (1)
> > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > more cases than you think.
> > > 
> > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but
> > > I read it as (1).
> > 
> > Yes.
> > 
> > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > accesses if enabled.
> > > > > >
> > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > cope with that, and that most likely affects libfdt as well?
> > > > >
> > > > > Ah yes, I think you are right.
> > > > >
> > > > > In that case, seems like we should figure out whether (internal)
> > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > rather than just "not a common at all case (as noted by it not having
> > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > reality.
> > > >
> > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > tags and other structural data (this amounts to requiring an alignment
> > > > gap after node names and property values).
> > > >
> > > > However, "all structural elements" does not include values within
> > > > property values themselves.  Assuming propery alignment of the blocks
> > > > and the blob itself, then all property values will *begin* 4 byte
> > > > aligned.  However that leaves two relevant cases:
> > > >
> > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > >     aligned
> > > 
> > > I'd assume that while an arch may support only the above in terms of
> > > misalignment, an arch that supports any alignment would always support
> > > this as part of that. It would just be odd to support byte alignment
> > > only up to 32-bit.
> > 
> > Yes, I'd expect so.
> > 
> > > I don't think we need to optimize the former case.
> > 
> > I don't see how we would, in any case.
> > 
> > > >  b) complex property values including both strings and integers
> > > >     typically use a packed representation with no alignment gaps.
> > > >     Such property structures are usually avoided in modern bindings,
> > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > >     that means that integer values sitting after arbitrary length
> > > >     strings may not have any natural alignment
> > > 
> > > That's the user's problem IMO. Users of older bindings having this
> > > aren't likely using a newish function like fdt32_ld either.
> > 
> > That doesn't follow.  The bindings still exist and are in use, e.g. on
> > IBM PAPR systems, that's not correlated to how recent teh libfdt is.
> > 
> > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > intended to handle those cases, so that they're useful for the caller
> > > > to pull things from properties as well as for libfdt internal
> > > > accesses.
> > > >
> > > > (*) There are a number of other functions that looked like they might
> > > >     be dangerous for case (a) because they are based on 64-bit
> > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > >     ok, because the way they're built in terms of other functions
> > > >     means there's implicitly a memcpy() from a byte buffer.
> > > >
> > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > the MMU off as well.
> > > > >
> > > > > I'm making a mental note of this for the next time performance issues come up.
> > > >
> > > > Right, running early with MMU off is definitely a real use case for
> > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > trap and handle unaligned accesses, which we might for a more
> > > > conventional userspace library.
> > > >
> > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > detailed knowledge of the target CPU, but it can also depend on
> > > > exactly what mode that hardware is in.
> > > 
> > > I think there's a more simple solution with no flags. Given all
> > > internal accesses are at least 4-byte aligned, libfdt should just do
> > > 32-bit accesses internally (as it used to). Maybe we need a check up
> > > front that the dtb is 8-byte aligned though.
> > 
> > That's not a bad idea.  We could do it in fdt_ro_probe_().
> > 
> > Although, one extra case occurs to me.  Someone (is it uboot?) has a
> > wacky format where dtbs for several platforms, along with kernels and
> > other information are bundled together in a big dtb (that is, using
> > the dtb encoding, even though it's not actually a device tree).  The
> > "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte
> > aligned.
> 
> Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere,
> really...) FIT images which are what you're thinking of.  That's
> unrelated to all of this however.

Well, not entirely, because it's a plausible reason someone would have
a dtb loaded at a non-8-byte aligned address (though it would be
4-byte aligned).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                                             ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-29 20:21                                                               ` Tom Rini
  0 siblings, 0 replies; 28+ messages in thread
From: Tom Rini @ 2020-10-29 20:21 UTC (permalink / raw)
  To: Rob Herring
  Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 13048 bytes --]

On Thu, Oct 29, 2020 at 01:08:37PM -0500, Rob Herring wrote:
> On Thu, Oct 29, 2020 at 11:04 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> >
> > On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote:
> > > On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > >
> > > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> > > > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > > > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> > > > > > > > >
> > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > > > > >
> > > > > > > > > > [...]
> > > > > > > > > >
> > > > > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > > > > >>>>
> > > > > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > > > > >>>> aligned and then having everything crash.
> > > > > > > > > >>>
> > > > > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > > > > >>
> > > > > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > > > > >> reside in memory and everything blew up.
> > > > > > > > > >>
> > > > > > > > > >> After that it was noticed that there could be some internal
> > > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > > > > >
> > > > > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > > > > >
> > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > > > > >>>>
> > > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > > > > >>>
> > > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > > > > >>
> > > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > > > > >
> > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > > > > size suffer with not doing unaligned accesses.
> > > > > > > > > >
> > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > > > > >
> > > > > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > > > > either is true.  However we need to consider those separately, because
> > > > > > > they can be independently true (or not) for different reasons.  (1)
> > > > > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > > > > more cases than you think.
> > > > > > >
> > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > > > > accesses if enabled.
> > > > > > > > >
> > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > > > > cope with that, and that most likely affects libfdt as well?
> > > > > > > >
> > > > > > > > Ah yes, I think you are right.
> > > > > > > >
> > > > > > > > In that case, seems like we should figure out whether (internal)
> > > > > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > > > > rather than just "not a common at all case (as noted by it not having
> > > > > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > > > > reality.
> > > > > > >
> > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > > > > tags and other structural data (this amounts to requiring an alignment
> > > > > > > gap after node names and property values).
> > > > > > >
> > > > > > > However, "all structural elements" does not include values within
> > > > > > > property values themselves.  Assuming propery alignment of the blocks
> > > > > > > and the blob itself, then all property values will *begin* 4 byte
> > > > > > > aligned.  However that leaves two relevant cases:
> > > > > > >
> > > > > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > > > > >     aligned
> > > > > > >  b) complex property values including both strings and integers
> > > > > > >     typically use a packed representation with no alignment gaps.
> > > > > > >     Such property structures are usually avoided in modern bindings,
> > > > > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > > > > >     that means that integer values sitting after arbitrary length
> > > > > > >     strings may not have any natural alignment
> > > > > > >
> > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > > > > intended to handle those cases, so that they're useful for the caller
> > > > > > > to pull things from properties as well as for libfdt internal
> > > > > > > accesses.
> > > > > > >
> > > > > > > (*) There are a number of other functions that looked like they might
> > > > > > >     be dangerous for case (a) because they are based on 64-bit
> > > > > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > > > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > > > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > > > > >     ok, because the way they're built in terms of other functions
> > > > > > >     means there's implicitly a memcpy() from a byte buffer.
> > > > > > >
> > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > > > > the MMU off as well.
> > > > > > > >
> > > > > > > > I'm making a mental note of this for the next time performance issues come up.
> > > > > > >
> > > > > > > Right, running early with MMU off is definitely a real use case for
> > > > > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > > > > trap and handle unaligned accesses, which we might for a more
> > > > > > > conventional userspace library.
> > > > > > >
> > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > > > > detailed knowledge of the target CPU, but it can also depend on
> > > > > > > exactly what mode that hardware is in.
> > > > > >
> > > > > > Can you please note the existing user(s) where we have just the right
> > > > > > combination of factors and so everything fails?
> > > > >
> > > > > Sorry, I don't understand the question.
> > > >
> > > > I'm asking what the platform(s) are that have the very specific "and
> > > > here be failure" problem you're concerned with.
> > >
> > > I'm also and still confused with your question.
> >
> > Maybe I'm just confused then.  I'm asking what hardware ends up causing
> > a fault when we read a property that starts at 0x....3 and we would not
> > have already enabled the MMU and done whatever else is required to have
> > the fault fixed up and handled.
> 
> Ah, well ARMv4 and v5 systems. Outside of that, I don't know.

Right, I don't think anything other than those cases has this specific
potential problem.  Which isn't a common case, but a potential case.

> But as David said, properties will start with 4 byte alignment. It's
> only accesses within a property value that could be a problem, but
> those would be done by the user of libfdt and probably already made
> them alignment safe if needed (or they use fdt{32,64}_ld() if
> relatively recently written).
> 
> > That's not the case in U-Boot.
> 
> But it is for u-boot as Andre pointed out if the MMU is off.

But that's just it, the MMU isn't, I would swear, off, at this point for
anything semi modern.  "Everyone" wants dcache, so we have to do some
simple MMU setup.  We care about that before we care about the device
tree.  Unless I'm way off in the weeds in my thinking right now.

> > But
> > that being a possible case seems to be where the underlying concern is.
> >
> > > >  I'm concerned that
> > > > right now we're going to end up with larger pile of reverts to dtc in
> > > > U-Boot rather than being able to just sync with the project properly
> > > > again.
> > >
> > > I think we have some agreement which I believe would end being the
> > > revert you originally submitted, but just keep the fdt*_ld() accessors
> > > which always do safe accesses.
> >
> > So keep fdt{32,64}_ld() and switch all of the callers back to
> > fdt{32,64}_to_cpu()?
> 
> Right.
> 
> OK, should I spin up a patch or ?
> 
> Yes.

I'll put it on the TODO list, thanks.

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
       [not found]                                                         ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
@ 2020-10-29 20:26                                                           ` Tom Rini
  0 siblings, 0 replies; 28+ messages in thread
From: Tom Rini @ 2020-10-29 20:26 UTC (permalink / raw)
  To: David Gibson
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 12491 bytes --]

On Fri, Oct 30, 2020 at 06:56:58AM +1100, David Gibson wrote:
> On Thu, Oct 29, 2020 at 11:04:01AM -0400, Tom Rini wrote:
> > On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote:
> > > On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote:
> > > > On Tue, Oct 27, 2020 at 11:26 PM David Gibson
> > > > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote:
> > > > >
> > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote:
> > > > > > >
> > > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > > >>>>>
> > > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > > >>>>
> > > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > > >>>> aligned and then having everything crash.
> > > > > > > >>>
> > > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > > >>
> > > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > > >> reside in memory and everything blew up.
> > > > > > > >>
> > > > > > > >> After that it was noticed that there could be some internal
> > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > > >
> > > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > > >
> > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > > >>>>
> > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > > >>>
> > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > > >>
> > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > > >
> > > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > > size suffer with not doing unaligned accesses.
> > > > > > > >
> > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > > >
> > > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > > either is true.  However we need to consider those separately, because
> > > > > they can be independently true (or not) for different reasons.  (1)
> > > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > > more cases than you think.
> > > > 
> > > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but
> > > > I read it as (1).
> > > 
> > > Yes.
> > > 
> > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > > accesses if enabled.
> > > > > > >
> > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > > cope with that, and that most likely affects libfdt as well?
> > > > > >
> > > > > > Ah yes, I think you are right.
> > > > > >
> > > > > > In that case, seems like we should figure out whether (internal)
> > > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > > rather than just "not a common at all case (as noted by it not having
> > > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > > reality.
> > > > >
> > > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > > tags and other structural data (this amounts to requiring an alignment
> > > > > gap after node names and property values).
> > > > >
> > > > > However, "all structural elements" does not include values within
> > > > > property values themselves.  Assuming propery alignment of the blocks
> > > > > and the blob itself, then all property values will *begin* 4 byte
> > > > > aligned.  However that leaves two relevant cases:
> > > > >
> > > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > > >     aligned
> > > > 
> > > > I'd assume that while an arch may support only the above in terms of
> > > > misalignment, an arch that supports any alignment would always support
> > > > this as part of that. It would just be odd to support byte alignment
> > > > only up to 32-bit.
> > > 
> > > Yes, I'd expect so.
> > > 
> > > > I don't think we need to optimize the former case.
> > > 
> > > I don't see how we would, in any case.
> > > 
> > > > >  b) complex property values including both strings and integers
> > > > >     typically use a packed representation with no alignment gaps.
> > > > >     Such property structures are usually avoided in modern bindings,
> > > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > > >     that means that integer values sitting after arbitrary length
> > > > >     strings may not have any natural alignment
> > > > 
> > > > That's the user's problem IMO. Users of older bindings having this
> > > > aren't likely using a newish function like fdt32_ld either.
> > > 
> > > That doesn't follow.  The bindings still exist and are in use, e.g. on
> > > IBM PAPR systems, that's not correlated to how recent teh libfdt is.
> > > 
> > > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > > intended to handle those cases, so that they're useful for the caller
> > > > > to pull things from properties as well as for libfdt internal
> > > > > accesses.
> > > > >
> > > > > (*) There are a number of other functions that looked like they might
> > > > >     be dangerous for case (a) because they are based on 64-bit
> > > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > > >     ok, because the way they're built in terms of other functions
> > > > >     means there's implicitly a memcpy() from a byte buffer.
> > > > >
> > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > > the MMU off as well.
> > > > > >
> > > > > > I'm making a mental note of this for the next time performance issues come up.
> > > > >
> > > > > Right, running early with MMU off is definitely a real use case for
> > > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > > trap and handle unaligned accesses, which we might for a more
> > > > > conventional userspace library.
> > > > >
> > > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > > detailed knowledge of the target CPU, but it can also depend on
> > > > > exactly what mode that hardware is in.
> > > > 
> > > > I think there's a more simple solution with no flags. Given all
> > > > internal accesses are at least 4-byte aligned, libfdt should just do
> > > > 32-bit accesses internally (as it used to). Maybe we need a check up
> > > > front that the dtb is 8-byte aligned though.
> > > 
> > > That's not a bad idea.  We could do it in fdt_ro_probe_().
> > > 
> > > Although, one extra case occurs to me.  Someone (is it uboot?) has a
> > > wacky format where dtbs for several platforms, along with kernels and
> > > other information are bundled together in a big dtb (that is, using
> > > the dtb encoding, even though it's not actually a device tree).  The
> > > "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte
> > > aligned.
> > 
> > Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere,
> > really...) FIT images which are what you're thinking of.  That's
> > unrelated to all of this however.
> 
> Well, not entirely, because it's a plausible reason someone would have
> a dtb loaded at a non-8-byte aligned address (though it would be
> 4-byte aligned).

But that in turn gets us back to the original problem.  A board by
default was unfortunately setting the "do not relocate items to be
aligned before use" flag, and got the mess that resulted from that.

Another way to look at it I think is that since we (U-Boot) know what
the alignment requirements are when loading something to memory, it's on
us to make sure those requirements are met and not on a later access
library to deal with "our requirements were broken, but lets use it
anyways".

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Size growth?
  2020-10-29 15:06                                                   ` Tom Rini
  2020-10-29 15:48                                                     ` Rob Herring
@ 2020-11-02  2:06                                                     ` David Gibson
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2020-11-02  2:06 UTC (permalink / raw)
  To: Tom Rini
  Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler

[-- Attachment #1: Type: text/plain, Size: 10330 bytes --]

On Thu, Oct 29, 2020 at 11:06:03AM -0400, Tom Rini wrote:
> On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote:
> > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote:
> > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote:
> > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote:
> > > > > >
> > > > > > On 26/10/2020 21:51, Rob Herring wrote:
> > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote:
> > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > >>>>>> But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > > > >>>>>> 11738cf01f15 reduces it just a little.
> > > > > > >>>>>
> > > > > > >>>>> Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > > > > >>>>> instead get intermittent bug reports where it just crashes.
> > > > > > >>>>
> > > > > > >>>> We really need to talk about that then.  There was a problem of people
> > > > > > >>>> turning off the sanity check for making sure the entire device tree was
> > > > > > >>>> aligned and then having everything crash.
> > > > > > >>>
> > > > > > >>> Ok... I'm not really sure where you're going with that thought.
> > > > > > >>
> > > > > > >> In my reading of the mailing list history of how this issue came up,
> > > > > > >> it was someone was booting a dragonboard or something, and they (or
> > > > > > >> rather, the board maintainer set by default) the flag to use the device
> > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly
> > > > > > >> aligned address.  This in turn lead to the kernel getting an unaligned
> > > > > > >> device tree and everything crashing.  The "I know what I'm doing" flag
> > > > > > >> was set, violated the documented requirements for device trees need to
> > > > > > >> reside in memory and everything blew up.
> > > > > > >>
> > > > > > >> After that it was noticed that there could be some internal
> > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't
> > > > > > >> support doing those reads easily there could be problems, but that's not
> > > > > > >> a common at all case (as noted by it not having been seen in practice).
> > > > > > >
> > > > > > > Nor a problem on many environments to begin with. More below...
> > > > > > >
> > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load
> > > > > > >>>>> an unaligned value from a property (more likely, but don't add the
> > > > > > >>>>> flag if you're not sure you don't need it).
> > > > > > >>>>
> > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of
> > > > > > >>>> everything again, yes, that is the right way forward I think.
> > > > > > >>>
> > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with
> > > > > > >>> normal optimization levels enabled in the compiler), so testing for
> > > > > > >>> those shouldn't increase size at all.  If they do, something is wrong.
> > > > > > >>
> > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be
> > > > > > >> done in such a way the compiler really will be smart about it.  So
> > > > > > >> something like making a new function that does fdt64_ld() if we aren't
> > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
> > > > > > >> ASSUME_ALIGNED_ACCESS.
> > > > > > >
> > > > > > > Ah, unaligned accesses again... To summarize, both performance and
> > > > > > > size suffer with not doing unaligned accesses.
> > > > > > >
> > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will
> > > > > > > do unaligned accesses? That would be more aligned with what the system
> > > > > > > can support rather than sanity checking associated with ASSUME_*.
> > > > 
> > > > So, there are kind of two things here, (1) is "my platform can handle
> > > > unaligned accesses" and (2) is "assume I don't need unaligned
> > > > accesses".  We can use the fast & small versions of fdt32_ld() etc. if
> > > > either is true.  However we need to consider those separately, because
> > > > they can be independently true (or not) for different reasons.  (1)
> > > > depends on the hardware, whereas (2) depends on how you're using dtc,
> > > > and, see below, you may need at least unaligned-handling fdt64_ld() in
> > > > more cases than you think.
> > > > 
> > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned
> > > > > > > accesses if enabled.
> > > > > >
> > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM
> > > > > > ARM correctly, unaligned accesses always trap on device memory,
> > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device
> > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to
> > > > > > cope with that, and that most likely affects libfdt as well?
> > > > > 
> > > > > Ah yes, I think you are right.
> > > > > 
> > > > > In that case, seems like we should figure out whether (internal)
> > > > > unaligned accesses are possible with dtc generated dtbs at least
> > > > > rather than just "not a common at all case (as noted by it not having
> > > > > been seen in practice)." I'm sure David will point out that not all
> > > > > dtbs come from dtc, but all the ones u-boot deals with do in
> > > > > reality.
> > > > 
> > > > Assuming the blob itself is 8-byte aligned in memory, then all
> > > > structural elements (i.e. the tree metadata) of a compliant dtb will
> > > > be naturally aligned.  The spec requires 8-byte alignment of the mem
> > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure
> > > > block w.r.t. the base of the blob.  Likewise the layout of the mem
> > > > reserve block will preserve 8-byte alignment of all the 64-bit values
> > > > it contains, assuming the block itself starts 8-byte aligned.
> > > > Similarly the structure blob will preserve 4-byte alignment of all its
> > > > tags and other structural data (this amounts to requiring an alignment
> > > > gap after node names and property values).
> > > > 
> > > > However, "all structural elements" does not include values within
> > > > property values themselves.  Assuming propery alignment of the blocks
> > > > and the blob itself, then all property values will *begin* 4 byte
> > > > aligned.  However that leaves two relevant cases:
> > > > 
> > > >  a) 64-bit property values may be 4-byte aligned but not 8-byte
> > > >     aligned
> > > >  b) complex property values including both strings and integers
> > > >     typically use a packed representation with no alignment gaps.
> > > >     Such property structures are usually avoided in modern bindings,
> > > >     but they definitely exist in a bunch of older bindings.  Obviously
> > > >     that means that integer values sitting after arbitrary length
> > > >     strings may not have any natural alignment
> > > > 
> > > > So acccesses made by libfdt internally should be safe(*) assuming the
> > > > blob itself is loaded 8-byte aligned, and the dtb is compliant.
> > > > However the libfdt user may hit both problems (a) and (b) getting
> > > > things they actually want from the tree.  fdt{32,64}_{ld,st}() are
> > > > intended to handle those cases, so that they're useful for the caller
> > > > to pull things from properties as well as for libfdt internal
> > > > accesses.
> > > > 
> > > > (*) There are a number of other functions that looked like they might
> > > >     be dangerous for case (a) because they are based on 64-bit
> > > >     property values: fdt_setprop_inplace_u64(), fdt_property_u64(),
> > > >     fdt_setprop_u64(), fdt_appendprop_u64() and
> > > >     fdt_appendprop_addrrange().  However I think they're actually
> > > >     ok, because the way they're built in terms of other functions
> > > >     means there's implicitly a memcpy() from a byte buffer.
> > > > 
> > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled
> > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with
> > > > > > the MMU off as well.
> > > > > 
> > > > > I'm making a mental note of this for the next time performance issues come up.
> > > > 
> > > > Right, running early with MMU off is definitely a real use case for
> > > > libfdt.  For similar reasons we can't assume we have an OS which will
> > > > trap and handle unaligned accesses, which we might for a more
> > > > conventional userspace library.
> > > > 
> > > > This kind of underscores why I'm a bit hesitant to introduce "my
> > > > platform handles unaligned acccesses" flag.  Not only does it require
> > > > detailed knowledge of the target CPU, but it can also depend on
> > > > exactly what mode that hardware is in.
> > > 
> > > Can you please note the existing user(s) where we have just the right
> > > combination of factors and so everything fails?
> > 
> > Sorry, I don't understand the question.
> 
> I'm asking what the platform(s) are that have the very specific "and
> here be failure" problem you're concerned with.  I'm concerned that
> right now we're going to end up with larger pile of reverts to dtc in
> U-Boot rather than being able to just sync with the project properly
> again.

I'm afraid I don't know.  I just know that the problem has been
reported on several occasions.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-11-02  2:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 19:36 Size growth? Tom Rini
2020-10-16 21:46 ` Simon Glass
     [not found]   ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-19  1:42     ` David Gibson
     [not found]       ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-19 12:37         ` Tom Rini
2020-10-20  2:09           ` David Gibson
     [not found]             ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-21 22:49               ` Tom Rini
2020-10-22  4:00                 ` David Gibson
     [not found]                   ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-22 12:32                     ` Tom Rini
2020-10-22 14:58                       ` David Gibson
     [not found]                         ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-22 15:22                           ` Tom Rini
2020-10-26 21:51                             ` Rob Herring
     [not found]                               ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-26 22:17                                 ` Tom Rini
2020-10-27 15:57                                 ` André Przywara
     [not found]                                   ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org>
2020-10-27 19:55                                     ` Rob Herring
     [not found]                                       ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-28  4:26                                         ` David Gibson
     [not found]                                           ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-28 12:05                                             ` Tom Rini
2020-10-29  2:55                                               ` David Gibson
     [not found]                                                 ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-29 15:06                                                   ` Tom Rini
2020-10-29 15:48                                                     ` Rob Herring
     [not found]                                                       ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-29 16:04                                                         ` Tom Rini
2020-10-29 18:08                                                           ` Rob Herring
     [not found]                                                             ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-29 20:21                                                               ` Tom Rini
2020-11-02  2:06                                                     ` David Gibson
2020-10-28 17:49                                             ` Rob Herring
     [not found]                                               ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-29  3:02                                                 ` David Gibson
     [not found]                                                   ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-29 15:04                                                     ` Tom Rini
2020-10-29 19:56                                                       ` David Gibson
     [not found]                                                         ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>
2020-10-29 20:26                                                           ` Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.