From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@armlinux.org.uk (Russell King - ARM Linux) Date: Tue, 6 Nov 2018 16:06:25 +0000 Subject: [PATCH 3/4] ARM: merge -fdata-sections BSS data to .bss section In-Reply-To: References: <20181106133935.GB30658@n2100.armlinux.org.uk> <20181106141033.GC30658@n2100.armlinux.org.uk> <20181106142953.GE30658@n2100.armlinux.org.uk> <20181106144554.GF30658@n2100.armlinux.org.uk> Message-ID: <20181106160625.GG30658@n2100.armlinux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Nov 06, 2018 at 04:18:46PM +0100, Ard Biesheuvel wrote: > On 6 November 2018 at 15:45, Russell King - ARM Linux > wrote: > > On Tue, Nov 06, 2018 at 03:39:58PM +0100, Ard Biesheuvel wrote: > >> On 6 November 2018 at 15:29, Russell King - ARM Linux > >> wrote: > >> > On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote: > >> >> On 6 November 2018 at 15:10, Russell King - ARM Linux > >> >> wrote: > >> >> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote: > >> >> >> On 6 November 2018 at 14:40, Russell King wrote: > >> >> >> > When building with -fdata-sections, the BSS data is placed into separate > >> >> >> > sections, so we need to ask the linker to group these together into the > >> >> >> > .bss output section, so that they're correctly placed. > >> >> >> > > >> >> >> > Signed-off-by: Russell King > >> >> >> > --- > >> >> >> > arch/arm/boot/compressed/vmlinux.lds.S | 2 +- > >> >> >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> >> >> > > >> >> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S > >> >> >> > index 2b963d8e76dd..3b91bc3c606f 100644 > >> >> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S > >> >> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S > >> >> >> > @@ -118,7 +118,7 @@ SECTIONS > >> >> >> > > >> >> >> > . = BSS_START; > >> >> >> > __bss_start = .; > >> >> >> > - .bss : { *(.bss) } > >> >> >> > + .bss : { *(.bss) *(.bss.*) } > >> >> >> > >> >> >> Would it make sense to sort these by alignment? Otherwise, I suspect > >> >> >> you may get a lot of padding holes due to the number of different > >> >> >> input sections, each with its own alignment. > >> >> > > >> >> > We don't bother elsewhere in the kernel linker script - do you have > >> >> > a case where we get lots of padding? > >> >> > > >> >> > >> >> With -fdata-sections? How else are we ensuring that each .bss item in > >> >> its own section is not placed such that its alignment results in a > >> >> padding hole? > >> > > >> > Quite simply, we don't care (at the moment). The alignment does not > >> > come from the size of each individual section, but the alignment > >> > requirements of data within the section. > >> > > >> > See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN > >> > symbols from around line 60. > >> > > >> > >> This seems like an oversight to me: consider something like* > >> > >> unsigned char bar = 1; > >> unsigned long long foo = 4; > >> > >> build it with > >> > >> arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c > >> > >> and we end up with the following object file > >> > >> Section Headers: > >> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al > >> [ 0] NULL 00000000 000000 000000 00 0 0 0 > >> [ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 1 > >> [ 2] .data PROGBITS 00000000 000034 000000 00 WA 0 0 1 > >> [ 3] .bss NOBITS 00000000 000034 000000 00 WA 0 0 1 > >> [ 4] .data.bar PROGBITS 00000000 000034 000001 00 WA 0 0 1 > >> [ 5] .data.foo PROGBITS 00000000 000038 000008 00 WA 0 0 8 > >> ... > >> > >> Symbol table '.symtab' contains 13 entries: > >> Num: Value Size Type Bind Vis Ndx Name > >> 11: 00000000 8 OBJECT GLOBAL DEFAULT 5 foo > >> 12: 00000000 1 OBJECT GLOBAL DEFAULT 4 bar > >> > >> whereas if I remove the -fdata-sections argument, I get > >> > >> Section Headers: > >> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al > >> [ 0] NULL 00000000 000000 000000 00 0 0 0 > >> [ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 1 > >> [ 2] .data PROGBITS 00000000 000038 000009 00 WA 0 0 8 > >> [ 3] .bss NOBITS 00000000 000041 000000 00 WA 0 0 1 > >> ... > >> > >> and > >> > >> Symbol table '.symtab' contains 11 entries: > >> Num: Value Size Type Bind Vis Ndx Name > >> ... > >> 9: 00000000 8 OBJECT GLOBAL DEFAULT 2 foo > >> 10: 00000008 1 OBJECT GLOBAL DEFAULT 2 bar > >> > >> In other words, we end up with a 7 byte padding hole by switching to > >> -fdata-sections, unless we sort the input objects by alignment. > >> > >> * for my test compiile, -fdata-sections would not produce different > >> input section so I am using .data for this example. > > > > Nevertheless, changing this is not a subject for a patch series that > > targets ARM. What you've identified is a deficiency in the generic > > cross-arch support for this which needs a wider audience to be > > involved. > > > > Fair enough. > > For the record, when I build the arm64 defconfig kernel with > LD_DEAD_CODE_ELIMINATION turned on, adding --sort-section=alignment > reduces the image size from > > text data bss dec hex filename > 11951684 6936332 385040 19273056 1261560 vmlinux > > to > > text data bss dec hex filename > 11938452 6930404 383625 19252481 125c501 vmlinux I'm not sure that's relevant when this thread is about 32bit ARM. It's a little like building an x86 kernel and posting the results. Two entirely different architectures with different results. Here's the numbers for an imx6 ARM kernel: text data bss dec hex filename 8438980 3255060 9640528 21334568 1458a28 imx6-unpatched/vmlinux 6164770 84 4120 6168974 5e218e imx6-unpatched/arch/arm/boot/compressed/vmlinux 8438980 3255060 9640528 21334568 1458a28 imx6-dc-dis/vmlinux 6164770 84 4120 6168974 5e218e imx6-dc-dis/arch/arm/boot/compressed/vmlinux 8448960 3253388 9639992 21342340 145a884 imx6-dc-ena/vmlinux 6175233 84 4120 6179437 5e4a6d imx6-dc-ena/arch/arm/boot/compressed/vmlinux 8446700 3245700 9639212 21331612 1457e9c imx6-dc-sort/vmlinux 6172284 84 4120 6176488 5e3ee8 imx6-dc-sort/arch/arm/boot/compressed/vmlinux The interesting thing here is that while the data sizes are reduced by enabling this option, but the text size actually _increases_ by more than we save. So the result is an overall bigger kernel. Presumably this is because of the literal pools needing to be larger in the .text segments, and more instructions necessary to access the data as the compiler can no longer know the relative displacement between neighbouring data items in the object file. So, it seems dead-code elimination is not that useful on 32-bit ARM. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up