linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
@ 2017-10-26 20:29 Nick Desaulniers
  2017-10-26 20:41 ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 20:29 UTC (permalink / raw)
  Cc: Jiri Kosina, Kees Cook, Nick Desaulniers, Catalin Marinas,
	Will Deacon, linux-arm-kernel, linux-kernel

Upon upgrading to binutils 2.27, we found that our lz4 compressed kernel
images were significantly larger, resulting is 10ms boot time regressions.

As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64.  On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations.  In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64.  The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations.  Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.

The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures.  The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.

To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" ./build/build.sh
With this flag, the compressed image size is back to what it was with
binutils-2.25.

If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.

If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."

Some measurements:

$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
                    ^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb

$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
                    ^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb

post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb

10ms boot time savings isn't anything to get excited about, but users of
arm64+lz4+bfd-2.27 should not have to pay a penalty for no runtime
improvement.

Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: Sindhuri Pentyala <spentyala@google.com>
Reported-by: Wei Wang <wvw@google.com>
Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: Siqi Lin <siqilin@google.com>
Suggested-by: Stephen Hines <srhines@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
* Question to reviewers: should I be using $(and ..., ...) here rather
  than double equals block? grep turns up no hits in the kernel for
  an example.

 arch/arm64/Makefile | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..eed3b8bdc089 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -18,6 +18,12 @@ ifneq ($(CONFIG_RELOCATABLE),)
 LDFLAGS_vmlinux		+= -pie -shared -Bsymbolic
 endif
 
+ifeq ($(CONFIG_KERNEL_LZ4), y)
+ifeq ($(CONFIG_RANDOMIZE_BASE), y)
+LDFLAGS_vmlinux		+= $(call ld-option, --no-apply-dynamic-relocs)
+endif
+endif
+
 ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
   ifeq ($(call ld-option, --fix-cortex-a53-843419),)
 $(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 20:29 [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27 Nick Desaulniers
@ 2017-10-26 20:41 ` Ard Biesheuvel
  2017-10-26 20:55   ` Nick Desaulniers
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-10-26 20:41 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Kees Cook, Catalin Marinas, Jiri Kosina, linux-kernel,
	Will Deacon, linux-arm-kernel

On 26 October 2017 at 21:29, Nick Desaulniers <ndesaulniers@google.com> wrote:
> Upon upgrading to binutils 2.27, we found that our lz4 compressed kernel
> images were significantly larger, resulting is 10ms boot time regressions.
>
> As noted by Rahul:
> "aarch64 binaries uses RELA relocations, where each relocation entry
> includes an addend value. This is similar to x86_64.  On x86_64, the
> addend values are also stored at the relocation offset for relative
> relocations. This is an optimization: in the case where code does not
> need to be relocated, the loader can simply skip processing relative
> relocations.  In binutils-2.25, both bfd and gold linkers did this for
> x86_64, but only the gold linker did this for aarch64.  The kernel build
> here is using the bfd linker, which stored zeroes at the relocation
> offsets for relative relocations.  Since a set of zeroes compresses
> better than a set of non-zero addend values, this behavior was resulting
> in much better lz4 compression.
>
> The bfd linker in binutils-2.27 is now storing the actual addend values
> at the relocation offsets. The behavior is now consistent with what it
> does for x86_64 and what gold linker does for both architectures.  The
> change happened in this upstream commit:
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
> Since a bunch of zeroes got replaced by non-zero addend values, we see
> the side effect of lz4 compressed image being a bit bigger.
>
> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
> flag can be used:
> $ LDFLAGS="--no-apply-dynamic-relocs" ./build/build.sh
> With this flag, the compressed image size is back to what it was with
> binutils-2.25.
>
> If the kernel is using ASLR, there aren't additional runtime costs to
> --no-apply-dynamic-relocs, as the relocations will need to be applied
> again anyway after the kernel is relocated to a random address.
>
> If the kernel is not using ASLR, then presumably the current default
> behavior of the linker is better. Since the static linker performed the
> dynamic relocs, and the kernel is not moved to a different address at
> load time, it can skip applying the relocations all over again."
>
> Some measurements:
>
> $ ld -v
> GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
>                     ^
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb
>
> $ ld -v
> GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
>                     ^
> pre patch:
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb
>
> post patch:
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb
>
> 10ms boot time savings isn't anything to get excited about, but users of
> arm64+lz4+bfd-2.27 should not have to pay a penalty for no runtime
> improvement.
>
> Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
> Reported-by: Sindhuri Pentyala <spentyala@google.com>
> Reported-by: Wei Wang <wvw@google.com>
> Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
> Suggested-by: Siqi Lin <siqilin@google.com>
> Suggested-by: Stephen Hines <srhines@google.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> * Question to reviewers: should I be using $(and ..., ...) here rather
>   than double equals block? grep turns up no hits in the kernel for
>   an example.
>
>  arch/arm64/Makefile | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 939b310913cf..eed3b8bdc089 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -18,6 +18,12 @@ ifneq ($(CONFIG_RELOCATABLE),)
>  LDFLAGS_vmlinux                += -pie -shared -Bsymbolic
>  endif
>
> +ifeq ($(CONFIG_KERNEL_LZ4), y)
> +ifeq ($(CONFIG_RANDOMIZE_BASE), y)
> +LDFLAGS_vmlinux                += $(call ld-option, --no-apply-dynamic-relocs)
> +endif
> +endif
> +

Since we will need to support bfd ld < 2.27 for a while to come, and
given that we cannot test in the code whether the relocation targets
are seeded with the correct values, I propose we simply drop the outer
ifeq here, and  stick with the old behavior unconditionally. Once
we're ready to drop support for <2.27 binutils, we can revisit this if
desired.

Also, you should be using CONFIG_RELOCATABLE not CONFIG_RANDOMIZE_BASE,.

>  ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
>    ifeq ($(call ld-option, --fix-cortex-a53-843419),)
>  $(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
> --
> 2.15.0.rc2.357.g7e34df9404-goog
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 20:41 ` Ard Biesheuvel
@ 2017-10-26 20:55   ` Nick Desaulniers
  2017-10-26 21:06     ` Nick Desaulniers
  2017-10-26 21:43     ` [PATCH v2] " Nick Desaulniers
  0 siblings, 2 replies; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 20:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Catalin Marinas, Jiri Kosina, linux-kernel,
	Will Deacon, linux-arm-kernel, Wei Wang

On Thu, Oct 26, 2017 at 1:41 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> Since we will need to support bfd ld < 2.27 for a while to come, and
> given that we cannot test in the code whether the relocation targets
> are seeded with the correct values, I propose we simply drop the outer
> ifeq here, and  stick with the old behavior unconditionally. Once
> we're ready to drop support for <2.27 binutils, we can revisit this if
> desired.

Ard, thanks for the quick review!

--no-apply-dynamic-relocs was added in binutils 2.27, so ld-option
should support 2.27 and prior:

$ aarch64-linux-android/bin/ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
$ aarch64-linux-android/bin/ld -E --no-apply-dynamic-relocs temp.o
$ echo $?
0

$ aarch64-linux-android/bin/ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
$ aarch64-linux-android/bin/ld -E --no-apply-dynamic-relocs temp.o
./prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/bin/ld:
unrecognized option '--no-apply-dynamic-relocs'
$ echo $?
1

ld-option will catch that.

> are seeded with the correct values, I propose we simply drop the outer

I haven't verified this with other compression schemes, but my
teammate Wei just reported this benefits gzip as well.  What do you
think?

> Also, you should be using CONFIG_RELOCATABLE not CONFIG_RANDOMIZE_BASE,.

Sure thing, will send v2 once you clarify the outer conditional and if
ld-option helps us take this patch now.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 20:55   ` Nick Desaulniers
@ 2017-10-26 21:06     ` Nick Desaulniers
  2017-10-26 21:17       ` Siqi Lin
  2017-10-26 21:43     ` [PATCH v2] " Nick Desaulniers
  1 sibling, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 21:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Catalin Marinas, Jiri Kosina, linux-kernel,
	Will Deacon, linux-arm-kernel, Wei Wang, Gopinath Elanchezhian,
	spentyala, Rahul Chaudhry, Siqi Lin, Stephen Hines

+ folks in Suggested-by/Reported by lines, since git send-email seems
to only pull in folks on Signed-off-by line :(

https://lkml.org/lkml/2017/10/26/590

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 21:06     ` Nick Desaulniers
@ 2017-10-26 21:17       ` Siqi Lin
  2017-10-26 21:23         ` Nick Desaulniers
  0 siblings, 1 reply; 17+ messages in thread
From: Siqi Lin @ 2017-10-26 21:17 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Ard Biesheuvel, Kees Cook, Catalin Marinas, Jiri Kosina,
	linux-kernel, Will Deacon, linux-arm-kernel, Wei Wang,
	Gopinath Elanchezhian, spentyala, Rahul Chaudhry, Stephen Hines

On Thu, Oct 26, 2017 at 2:06 PM, Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> + folks in Suggested-by/Reported by lines, since git send-email seems
> to only pull in folks on Signed-off-by line :(
>
> https://lkml.org/lkml/2017/10/26/590

I'm OK with sticking with the <2.27 binutils behavior. The gzip data is:

binutils 2.25:
Image 41467904
Image.gz 13395151
binutils 2.27:
Image 41467392
Image.gz 14114953

gzipped kernel increased by 0.69 MiB.

The one special case I see is !CONFIG_RELOCATABLE and compression is
used, where there's a tradeoff between compressed image size and the
benefit of dynamic relocs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 21:17       ` Siqi Lin
@ 2017-10-26 21:23         ` Nick Desaulniers
  2017-10-26 21:41           ` Nick Desaulniers
  2017-10-26 21:51           ` Siqi Lin
  0 siblings, 2 replies; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 21:23 UTC (permalink / raw)
  To: Siqi Lin
  Cc: Ard Biesheuvel, Kees Cook, Catalin Marinas, Jiri Kosina,
	linux-kernel, Will Deacon, linux-arm-kernel, Wei Wang,
	Gopinath Elanchezhian, spentyala, Rahul Chaudhry, Stephen Hines

On Thu, Oct 26, 2017 at 2:17 PM, Siqi Lin <siqilin@google.com> wrote:
> I'm OK with sticking with the <2.27 binutils behavior. The gzip data is:

That's what this patch does; goes back to the <2.27 behavior for 2.27+.

> binutils 2.25:
> Image 41467904
> Image.gz 13395151
> binutils 2.27:
> Image 41467392
> Image.gz 14114953
>
> gzipped kernel increased by 0.69 MiB.

That's without this patch applied?  With it applied, what are the
stats (for gzip)?

> The one special case I see is !CONFIG_RELOCATABLE and compression is
> used, where there's a tradeoff between compressed image size and the
> benefit of dynamic relocs.

if !CONFIG_RELOCATABLE, then this patch (well v2 which will use
CONFIG_RELOCATABLE rather than CONFIG_RANDOMIZE_BASE) doesn't do
anything.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 21:23         ` Nick Desaulniers
@ 2017-10-26 21:41           ` Nick Desaulniers
  2017-10-26 21:51           ` Siqi Lin
  1 sibling, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 21:41 UTC (permalink / raw)
  To: Siqi Lin
  Cc: Ard Biesheuvel, Kees Cook, Catalin Marinas, Jiri Kosina,
	linux-kernel, Will Deacon, linux-arm-kernel, Wei Wang,
	Gopinath Elanchezhian, spentyala, Rahul Chaudhry, Stephen Hines

> On Thu, Oct 26, 2017 at 1:41 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:>are seeded with the correct values,
I propose we simply drop the outer

> On Thu, Oct 26, 2017 at 2:17 PM, Siqi Lin <siqilin@google.com> wrote:
>> I'm OK with sticking with the <2.27 binutils behavior. The gzip data is:

> ... Nick:
> That's what this patch does; goes back to the <2.27 behavior for 2.27+.

Sorry, rereading this thread, it sounds like neither of you were
disagreeing with me?  Will post v2.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 20:55   ` Nick Desaulniers
  2017-10-26 21:06     ` Nick Desaulniers
@ 2017-10-26 21:43     ` Nick Desaulniers
  2017-10-27  9:41       ` Ard Biesheuvel
  1 sibling, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-26 21:43 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jiri Kosina, Kees Cook, Gopinath Elanchezhian, Sindhuri Pentyala,
	Wei Wang, Rahul Chaudhry, Siqi Lin, Stephen Hines,
	Nick Desaulniers, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

Upon upgrading to binutils 2.27, we found that our lz4 compressed kernel
images were significantly larger, resulting is 10ms boot time regressions.

As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64.  On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations.  In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64.  The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations.  Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.

The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures.  The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.

To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" ./build/build.sh
With this flag, the compressed image size is back to what it was with
binutils-2.25.

If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.

If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."

Some measurements:

$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
                    ^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb

$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
                    ^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb

post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb

We've also verified gzip improves by 0.69 MB.

Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.

10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a boot time penalty for no
runtime improvement.

Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: Sindhuri Pentyala <spentyala@google.com>
Reported-by: Wei Wang <wvw@google.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: Siqi Lin <siqilin@google.com>
Suggested-by: Stephen Hines <srhines@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
Changes since v1:
* dropped LZ4 only outer conditional, per Ard.
* changed inner conditional for all RELOCATABLE, not just
  RANDOMIZE_BASE, per Ard.
* updated commit message with findings for gzip.
* added Ard to suggested by line in commit message.

 arch/arm64/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..9f47d4276a21 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -18,6 +18,10 @@ ifneq ($(CONFIG_RELOCATABLE),)
 LDFLAGS_vmlinux		+= -pie -shared -Bsymbolic
 endif
 
+ifeq ($(CONFIG_RELOCATABLE), y)
+LDFLAGS_vmlinux		+= $(call ld-option, --no-apply-dynamic-relocs)
+endif
+
 ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
   ifeq ($(call ld-option, --fix-cortex-a53-843419),)
 $(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 21:23         ` Nick Desaulniers
  2017-10-26 21:41           ` Nick Desaulniers
@ 2017-10-26 21:51           ` Siqi Lin
  1 sibling, 0 replies; 17+ messages in thread
From: Siqi Lin @ 2017-10-26 21:51 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Ard Biesheuvel, Kees Cook, Catalin Marinas, Jiri Kosina,
	linux-kernel, Will Deacon, linux-arm-kernel, Wei Wang,
	Gopinath Elanchezhian, spentyala, Rahul Chaudhry, Stephen Hines

On Thu, Oct 26, 2017 at 2:23 PM, Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Thu, Oct 26, 2017 at 2:17 PM, Siqi Lin <siqilin@google.com> wrote:
> > I'm OK with sticking with the <2.27 binutils behavior. The gzip data is:
>
> That's what this patch does; goes back to the <2.27 behavior for 2.27+.
>
> > binutils 2.25:
> > Image 41467904
> > Image.gz 13395151
> > binutils 2.27:
> > Image 41467392
> > Image.gz 14114953
> >
> > gzipped kernel increased by 0.69 MiB.
>
> That's without this patch applied?  With it applied, what are the
> stats (for gzip)?
>

binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image 41535488
Image.gz 13404067

binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image 41535488
Image.gz 14125516

The 2.27 gzipped size with this patch is about the same as 2.25.

> > The one special case I see is !CONFIG_RELOCATABLE and compression is
> > used, where there's a tradeoff between compressed image size and the
> > benefit of dynamic relocs.
>
> if !CONFIG_RELOCATABLE, then this patch (well v2 which will use
> CONFIG_RELOCATABLE rather than CONFIG_RANDOMIZE_BASE) doesn't do
> anything.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-26 21:43     ` [PATCH v2] " Nick Desaulniers
@ 2017-10-27  9:41       ` Ard Biesheuvel
  2017-10-27 16:33         ` [PATCH v3] " Nick Desaulniers
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-10-27  9:41 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Jiri Kosina, Kees Cook, Gopinath Elanchezhian, Sindhuri Pentyala,
	Wei Wang, Rahul Chaudhry, Siqi Lin, Stephen Hines,
	Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel

On 26 October 2017 at 22:43, Nick Desaulniers <ndesaulniers@google.com> wrote:
> Upon upgrading to binutils 2.27, we found that our lz4 compressed kernel
> images were significantly larger, resulting is 10ms boot time regressions.
>
> As noted by Rahul:
> "aarch64 binaries uses RELA relocations, where each relocation entry
> includes an addend value. This is similar to x86_64.  On x86_64, the
> addend values are also stored at the relocation offset for relative
> relocations. This is an optimization: in the case where code does not
> need to be relocated, the loader can simply skip processing relative
> relocations.  In binutils-2.25, both bfd and gold linkers did this for
> x86_64, but only the gold linker did this for aarch64.  The kernel build
> here is using the bfd linker, which stored zeroes at the relocation
> offsets for relative relocations.  Since a set of zeroes compresses
> better than a set of non-zero addend values, this behavior was resulting
> in much better lz4 compression.
>
> The bfd linker in binutils-2.27 is now storing the actual addend values
> at the relocation offsets. The behavior is now consistent with what it
> does for x86_64 and what gold linker does for both architectures.  The
> change happened in this upstream commit:
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
> Since a bunch of zeroes got replaced by non-zero addend values, we see
> the side effect of lz4 compressed image being a bit bigger.
>
> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
> flag can be used:
> $ LDFLAGS="--no-apply-dynamic-relocs" ./build/build.sh
> With this flag, the compressed image size is back to what it was with
> binutils-2.25.
>
> If the kernel is using ASLR, there aren't additional runtime costs to
> --no-apply-dynamic-relocs, as the relocations will need to be applied
> again anyway after the kernel is relocated to a random address.
>
> If the kernel is not using ASLR, then presumably the current default
> behavior of the linker is better. Since the static linker performed the
> dynamic relocs, and the kernel is not moved to a different address at
> load time, it can skip applying the relocations all over again."
>
> Some measurements:
>
> $ ld -v
> GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
>                     ^
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb
>
> $ ld -v
> GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
>                     ^
> pre patch:
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb
>
> post patch:
> $ ls -l vmlinux
> -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
> $ ls -l Image.lz4-dtb
> -rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb
>
> We've also verified gzip improves by 0.69 MB.
>
> Any compression scheme should be able to get better results from the
> longer runs of zeros, not just GZIP and LZ4.
>
> 10ms boot time savings isn't anything to get excited about, but users of
> arm64+compression+bfd-2.27 should not have to pay a boot time penalty for no
> runtime improvement.
>
> Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
> Reported-by: Sindhuri Pentyala <spentyala@google.com>
> Reported-by: Wei Wang <wvw@google.com>
> Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
> Suggested-by: Siqi Lin <siqilin@google.com>
> Suggested-by: Stephen Hines <srhines@google.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> Changes since v1:
> * dropped LZ4 only outer conditional, per Ard.
> * changed inner conditional for all RELOCATABLE, not just
>   RANDOMIZE_BASE, per Ard.
> * updated commit message with findings for gzip.
> * added Ard to suggested by line in commit message.
>
>  arch/arm64/Makefile | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 939b310913cf..9f47d4276a21 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -18,6 +18,10 @@ ifneq ($(CONFIG_RELOCATABLE),)
>  LDFLAGS_vmlinux                += -pie -shared -Bsymbolic
>  endif
>
> +ifeq ($(CONFIG_RELOCATABLE), y)
> +LDFLAGS_vmlinux                += $(call ld-option, --no-apply-dynamic-relocs)
> +endif
> +
>  ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
>    ifeq ($(call ld-option, --fix-cortex-a53-843419),)
>  $(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
> --
> 2.15.0.rc2.357.g7e34df9404-goog
>


If you fold it into the LDFLAGS_vmlinux assignment 3 lines up:

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-27  9:41       ` Ard Biesheuvel
@ 2017-10-27 16:33         ` Nick Desaulniers
  2017-10-30 13:08           ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2017-10-27 16:33 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jiri Kosina, Kees Cook, Gopinath Elanchezhian, Sindhuri Pentyala,
	Wei Wang, Rahul Chaudhry, Siqi Lin, Stephen Hines,
	Nick Desaulniers, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

Upon upgrading to binutils 2.27, we found that our lz4 and gzip
compressed kernel images were significantly larger, resulting is 10ms
boot time regressions.

As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64.  On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations.  In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64.  The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations.  Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.

The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures.  The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.

To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" make
With this flag, the compressed image size is back to what it was with
binutils-2.25.

If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.

If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."

Some measurements:

$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
                    ^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb

$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
                    ^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb

post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb

By Siqi's measurement w/ gzip:
binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image 41535488
Image.gz 13404067

binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image 41535488
Image.gz 14125516

Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.

10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a penalty for no
runtime improvement.

Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: Sindhuri Pentyala <spentyala@google.com>
Reported-by: Wei Wang <wvw@google.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: Siqi Lin <siqilin@google.com>
Suggested-by: Stephen Hines <srhines@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
Changes since v2:
* Combine this this LDFLAGS_vmlinux append to previous block, per Ard.
* Add Siqi's measurements to commit message.
* Change the conditional to check if set rather than if not empty, since
  this is similar to the rest of the Makefile, and more readable IMO,
  though that's subjective. That part can be reverted if we don't want to
  steal blame.

 arch/arm64/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..669378099dbe 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -14,8 +14,9 @@ LDFLAGS_vmlinux	:=-p --no-undefined -X
 CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
 GZFLAGS		:=-9
 
-ifneq ($(CONFIG_RELOCATABLE),)
-LDFLAGS_vmlinux		+= -pie -shared -Bsymbolic
+ifeq ($(CONFIG_RELOCATABLE), y)
+LDFLAGS_vmlinux		+= -pie -shared -Bsymbolic \
+			$(call ld-option, --no-apply-dynamic-relocs)
 endif
 
 ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-27 16:33         ` [PATCH v3] " Nick Desaulniers
@ 2017-10-30 13:08           ` Will Deacon
  2017-10-30 13:11             ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-10-30 13:08 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Ard Biesheuvel, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
> compressed kernel images were significantly larger, resulting is 10ms
> boot time regressions.
> 
> As noted by Rahul:
> "aarch64 binaries uses RELA relocations, where each relocation entry
> includes an addend value. This is similar to x86_64.  On x86_64, the
> addend values are also stored at the relocation offset for relative
> relocations. This is an optimization: in the case where code does not
> need to be relocated, the loader can simply skip processing relative
> relocations.  In binutils-2.25, both bfd and gold linkers did this for
> x86_64, but only the gold linker did this for aarch64.  The kernel build
> here is using the bfd linker, which stored zeroes at the relocation
> offsets for relative relocations.  Since a set of zeroes compresses
> better than a set of non-zero addend values, this behavior was resulting
> in much better lz4 compression.
> 
> The bfd linker in binutils-2.27 is now storing the actual addend values
> at the relocation offsets. The behavior is now consistent with what it
> does for x86_64 and what gold linker does for both architectures.  The
> change happened in this upstream commit:
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
> Since a bunch of zeroes got replaced by non-zero addend values, we see
> the side effect of lz4 compressed image being a bit bigger.
> 
> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
> flag can be used:
> $ LDFLAGS="--no-apply-dynamic-relocs" make
> With this flag, the compressed image size is back to what it was with
> binutils-2.25.
> 
> If the kernel is using ASLR, there aren't additional runtime costs to
> --no-apply-dynamic-relocs, as the relocations will need to be applied
> again anyway after the kernel is relocated to a random address.
> 
> If the kernel is not using ASLR, then presumably the current default
> behavior of the linker is better. Since the static linker performed the
> dynamic relocs, and the kernel is not moved to a different address at
> load time, it can skip applying the relocations all over again."

Do you have any numbers booting an uncompressed kernel Image without ASLR
to see if skipping the relocs makes a measurable difference there?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-30 13:08           ` Will Deacon
@ 2017-10-30 13:11             ` Ard Biesheuvel
  2017-10-30 13:12               ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-10-30 13:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On 30 October 2017 at 13:08, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
>> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
>> compressed kernel images were significantly larger, resulting is 10ms
>> boot time regressions.
>>
>> As noted by Rahul:
>> "aarch64 binaries uses RELA relocations, where each relocation entry
>> includes an addend value. This is similar to x86_64.  On x86_64, the
>> addend values are also stored at the relocation offset for relative
>> relocations. This is an optimization: in the case where code does not
>> need to be relocated, the loader can simply skip processing relative
>> relocations.  In binutils-2.25, both bfd and gold linkers did this for
>> x86_64, but only the gold linker did this for aarch64.  The kernel build
>> here is using the bfd linker, which stored zeroes at the relocation
>> offsets for relative relocations.  Since a set of zeroes compresses
>> better than a set of non-zero addend values, this behavior was resulting
>> in much better lz4 compression.
>>
>> The bfd linker in binutils-2.27 is now storing the actual addend values
>> at the relocation offsets. The behavior is now consistent with what it
>> does for x86_64 and what gold linker does for both architectures.  The
>> change happened in this upstream commit:
>> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
>> Since a bunch of zeroes got replaced by non-zero addend values, we see
>> the side effect of lz4 compressed image being a bit bigger.
>>
>> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
>> flag can be used:
>> $ LDFLAGS="--no-apply-dynamic-relocs" make
>> With this flag, the compressed image size is back to what it was with
>> binutils-2.25.
>>
>> If the kernel is using ASLR, there aren't additional runtime costs to
>> --no-apply-dynamic-relocs, as the relocations will need to be applied
>> again anyway after the kernel is relocated to a random address.
>>
>> If the kernel is not using ASLR, then presumably the current default
>> behavior of the linker is better. Since the static linker performed the
>> dynamic relocs, and the kernel is not moved to a different address at
>> load time, it can skip applying the relocations all over again."
>
> Do you have any numbers booting an uncompressed kernel Image without ASLR
> to see if skipping the relocs makes a measurable difference there?
>

Do you mean built with ASLR support but executing at the offset it was
linked at?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-30 13:11             ` Ard Biesheuvel
@ 2017-10-30 13:12               ` Will Deacon
  2017-10-30 13:35                 ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-10-30 13:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nick Desaulniers, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On Mon, Oct 30, 2017 at 01:11:23PM +0000, Ard Biesheuvel wrote:
> On 30 October 2017 at 13:08, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
> >> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
> >> compressed kernel images were significantly larger, resulting is 10ms
> >> boot time regressions.
> >>
> >> As noted by Rahul:
> >> "aarch64 binaries uses RELA relocations, where each relocation entry
> >> includes an addend value. This is similar to x86_64.  On x86_64, the
> >> addend values are also stored at the relocation offset for relative
> >> relocations. This is an optimization: in the case where code does not
> >> need to be relocated, the loader can simply skip processing relative
> >> relocations.  In binutils-2.25, both bfd and gold linkers did this for
> >> x86_64, but only the gold linker did this for aarch64.  The kernel build
> >> here is using the bfd linker, which stored zeroes at the relocation
> >> offsets for relative relocations.  Since a set of zeroes compresses
> >> better than a set of non-zero addend values, this behavior was resulting
> >> in much better lz4 compression.
> >>
> >> The bfd linker in binutils-2.27 is now storing the actual addend values
> >> at the relocation offsets. The behavior is now consistent with what it
> >> does for x86_64 and what gold linker does for both architectures.  The
> >> change happened in this upstream commit:
> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
> >> Since a bunch of zeroes got replaced by non-zero addend values, we see
> >> the side effect of lz4 compressed image being a bit bigger.
> >>
> >> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
> >> flag can be used:
> >> $ LDFLAGS="--no-apply-dynamic-relocs" make
> >> With this flag, the compressed image size is back to what it was with
> >> binutils-2.25.
> >>
> >> If the kernel is using ASLR, there aren't additional runtime costs to
> >> --no-apply-dynamic-relocs, as the relocations will need to be applied
> >> again anyway after the kernel is relocated to a random address.
> >>
> >> If the kernel is not using ASLR, then presumably the current default
> >> behavior of the linker is better. Since the static linker performed the
> >> dynamic relocs, and the kernel is not moved to a different address at
> >> load time, it can skip applying the relocations all over again."
> >
> > Do you have any numbers booting an uncompressed kernel Image without ASLR
> > to see if skipping the relocs makes a measurable difference there?
> >
> 
> Do you mean built with ASLR support but executing at the offset it was
> linked at?

Yeah, sorry for being vague. Basically, the case where the relocs have all
been resolved statically. In other words: what do we lose by disabling this
optimisation?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-30 13:12               ` Will Deacon
@ 2017-10-30 13:35                 ` Ard Biesheuvel
  2017-10-30 14:08                   ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-10-30 13:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On 30 October 2017 at 13:12, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Oct 30, 2017 at 01:11:23PM +0000, Ard Biesheuvel wrote:
>> On 30 October 2017 at 13:08, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
>> >> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
>> >> compressed kernel images were significantly larger, resulting is 10ms
>> >> boot time regressions.
>> >>
>> >> As noted by Rahul:
>> >> "aarch64 binaries uses RELA relocations, where each relocation entry
>> >> includes an addend value. This is similar to x86_64.  On x86_64, the
>> >> addend values are also stored at the relocation offset for relative
>> >> relocations. This is an optimization: in the case where code does not
>> >> need to be relocated, the loader can simply skip processing relative
>> >> relocations.  In binutils-2.25, both bfd and gold linkers did this for
>> >> x86_64, but only the gold linker did this for aarch64.  The kernel build
>> >> here is using the bfd linker, which stored zeroes at the relocation
>> >> offsets for relative relocations.  Since a set of zeroes compresses
>> >> better than a set of non-zero addend values, this behavior was resulting
>> >> in much better lz4 compression.
>> >>
>> >> The bfd linker in binutils-2.27 is now storing the actual addend values
>> >> at the relocation offsets. The behavior is now consistent with what it
>> >> does for x86_64 and what gold linker does for both architectures.  The
>> >> change happened in this upstream commit:
>> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
>> >> Since a bunch of zeroes got replaced by non-zero addend values, we see
>> >> the side effect of lz4 compressed image being a bit bigger.
>> >>
>> >> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
>> >> flag can be used:
>> >> $ LDFLAGS="--no-apply-dynamic-relocs" make
>> >> With this flag, the compressed image size is back to what it was with
>> >> binutils-2.25.
>> >>
>> >> If the kernel is using ASLR, there aren't additional runtime costs to
>> >> --no-apply-dynamic-relocs, as the relocations will need to be applied
>> >> again anyway after the kernel is relocated to a random address.
>> >>
>> >> If the kernel is not using ASLR, then presumably the current default
>> >> behavior of the linker is better. Since the static linker performed the
>> >> dynamic relocs, and the kernel is not moved to a different address at
>> >> load time, it can skip applying the relocations all over again."
>> >
>> > Do you have any numbers booting an uncompressed kernel Image without ASLR
>> > to see if skipping the relocs makes a measurable difference there?
>> >
>>
>> Do you mean built with ASLR support but executing at the offset it was
>> linked at?
>
> Yeah, sorry for being vague. Basically, the case where the relocs have all
> been resolved statically. In other words: what do we lose by disabling this
> optimisation?
>

The code does not deal with that at all, currently: given that this is
new behavior in 2.27, the relocs are processed unconditionally,
regardless of whether the image is loaded at its default base or not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-30 13:35                 ` Ard Biesheuvel
@ 2017-10-30 14:08                   ` Will Deacon
  2017-10-30 14:11                     ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-10-30 14:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nick Desaulniers, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On Mon, Oct 30, 2017 at 01:35:49PM +0000, Ard Biesheuvel wrote:
> On 30 October 2017 at 13:12, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Oct 30, 2017 at 01:11:23PM +0000, Ard Biesheuvel wrote:
> >> On 30 October 2017 at 13:08, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
> >> >> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
> >> >> compressed kernel images were significantly larger, resulting is 10ms
> >> >> boot time regressions.
> >> >>
> >> >> As noted by Rahul:
> >> >> "aarch64 binaries uses RELA relocations, where each relocation entry
> >> >> includes an addend value. This is similar to x86_64.  On x86_64, the
> >> >> addend values are also stored at the relocation offset for relative
> >> >> relocations. This is an optimization: in the case where code does not
> >> >> need to be relocated, the loader can simply skip processing relative
> >> >> relocations.  In binutils-2.25, both bfd and gold linkers did this for
> >> >> x86_64, but only the gold linker did this for aarch64.  The kernel build
> >> >> here is using the bfd linker, which stored zeroes at the relocation
> >> >> offsets for relative relocations.  Since a set of zeroes compresses
> >> >> better than a set of non-zero addend values, this behavior was resulting
> >> >> in much better lz4 compression.
> >> >>
> >> >> The bfd linker in binutils-2.27 is now storing the actual addend values
> >> >> at the relocation offsets. The behavior is now consistent with what it
> >> >> does for x86_64 and what gold linker does for both architectures.  The
> >> >> change happened in this upstream commit:
> >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
> >> >> Since a bunch of zeroes got replaced by non-zero addend values, we see
> >> >> the side effect of lz4 compressed image being a bit bigger.
> >> >>
> >> >> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
> >> >> flag can be used:
> >> >> $ LDFLAGS="--no-apply-dynamic-relocs" make
> >> >> With this flag, the compressed image size is back to what it was with
> >> >> binutils-2.25.
> >> >>
> >> >> If the kernel is using ASLR, there aren't additional runtime costs to
> >> >> --no-apply-dynamic-relocs, as the relocations will need to be applied
> >> >> again anyway after the kernel is relocated to a random address.
> >> >>
> >> >> If the kernel is not using ASLR, then presumably the current default
> >> >> behavior of the linker is better. Since the static linker performed the
> >> >> dynamic relocs, and the kernel is not moved to a different address at
> >> >> load time, it can skip applying the relocations all over again."
> >> >
> >> > Do you have any numbers booting an uncompressed kernel Image without ASLR
> >> > to see if skipping the relocs makes a measurable difference there?
> >> >
> >>
> >> Do you mean built with ASLR support but executing at the offset it was
> >> linked at?
> >
> > Yeah, sorry for being vague. Basically, the case where the relocs have all
> > been resolved statically. In other words: what do we lose by disabling this
> > optimisation?
> >
> 
> The code does not deal with that at all, currently: given that this is
> new behavior in 2.27, the relocs are processed unconditionally,
> regardless of whether the image is loaded at its default base or not.

Ah yeah, of course. Doesn't that make it impossible to exploit this
optimisation unless you can somehow guarantee that binutils has done
the relocs at link time? I guess you could check for non-zero entries at
the relocation offsets, but you still end up iterating.

Anyway, I'll take the patch but I don't understand how the binutils change
is intended to be used.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  2017-10-30 14:08                   ` Will Deacon
@ 2017-10-30 14:11                     ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-10-30 14:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Jiri Kosina, Kees Cook, Gopinath Elanchezhian,
	Sindhuri Pentyala, Wei Wang, Rahul Chaudhry, Siqi Lin,
	Stephen Hines, Catalin Marinas, linux-arm-kernel, linux-kernel

On 30 October 2017 at 14:08, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Oct 30, 2017 at 01:35:49PM +0000, Ard Biesheuvel wrote:
>> On 30 October 2017 at 13:12, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Oct 30, 2017 at 01:11:23PM +0000, Ard Biesheuvel wrote:
>> >> On 30 October 2017 at 13:08, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Fri, Oct 27, 2017 at 09:33:41AM -0700, Nick Desaulniers wrote:
>> >> >> Upon upgrading to binutils 2.27, we found that our lz4 and gzip
>> >> >> compressed kernel images were significantly larger, resulting is 10ms
>> >> >> boot time regressions.
>> >> >>
>> >> >> As noted by Rahul:
>> >> >> "aarch64 binaries uses RELA relocations, where each relocation entry
>> >> >> includes an addend value. This is similar to x86_64.  On x86_64, the
>> >> >> addend values are also stored at the relocation offset for relative
>> >> >> relocations. This is an optimization: in the case where code does not
>> >> >> need to be relocated, the loader can simply skip processing relative
>> >> >> relocations.  In binutils-2.25, both bfd and gold linkers did this for
>> >> >> x86_64, but only the gold linker did this for aarch64.  The kernel build
>> >> >> here is using the bfd linker, which stored zeroes at the relocation
>> >> >> offsets for relative relocations.  Since a set of zeroes compresses
>> >> >> better than a set of non-zero addend values, this behavior was resulting
>> >> >> in much better lz4 compression.
>> >> >>
>> >> >> The bfd linker in binutils-2.27 is now storing the actual addend values
>> >> >> at the relocation offsets. The behavior is now consistent with what it
>> >> >> does for x86_64 and what gold linker does for both architectures.  The
>> >> >> change happened in this upstream commit:
>> >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
>> >> >> Since a bunch of zeroes got replaced by non-zero addend values, we see
>> >> >> the side effect of lz4 compressed image being a bit bigger.
>> >> >>
>> >> >> To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
>> >> >> flag can be used:
>> >> >> $ LDFLAGS="--no-apply-dynamic-relocs" make
>> >> >> With this flag, the compressed image size is back to what it was with
>> >> >> binutils-2.25.
>> >> >>
>> >> >> If the kernel is using ASLR, there aren't additional runtime costs to
>> >> >> --no-apply-dynamic-relocs, as the relocations will need to be applied
>> >> >> again anyway after the kernel is relocated to a random address.
>> >> >>
>> >> >> If the kernel is not using ASLR, then presumably the current default
>> >> >> behavior of the linker is better. Since the static linker performed the
>> >> >> dynamic relocs, and the kernel is not moved to a different address at
>> >> >> load time, it can skip applying the relocations all over again."
>> >> >
>> >> > Do you have any numbers booting an uncompressed kernel Image without ASLR
>> >> > to see if skipping the relocs makes a measurable difference there?
>> >> >
>> >>
>> >> Do you mean built with ASLR support but executing at the offset it was
>> >> linked at?
>> >
>> > Yeah, sorry for being vague. Basically, the case where the relocs have all
>> > been resolved statically. In other words: what do we lose by disabling this
>> > optimisation?
>> >
>>
>> The code does not deal with that at all, currently: given that this is
>> new behavior in 2.27, the relocs are processed unconditionally,
>> regardless of whether the image is loaded at its default base or not.
>
> Ah yeah, of course. Doesn't that make it impossible to exploit this
> optimisation unless you can somehow guarantee that binutils has done
> the relocs at link time? I guess you could check for non-zero entries at
> the relocation offsets, but you still end up iterating.
>
> Anyway, I'll take the patch but I don't understand how the binutils change
> is intended to be used.
>

This seems to be a 'parity with x86 for the sake of it' thing. I see
how not having to process relocations can be an advantage in some
cases, but it does duplicate information (as this patch proves), and
the cat's already out of the bag anyway.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-30 14:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-26 20:29 [PATCH] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27 Nick Desaulniers
2017-10-26 20:41 ` Ard Biesheuvel
2017-10-26 20:55   ` Nick Desaulniers
2017-10-26 21:06     ` Nick Desaulniers
2017-10-26 21:17       ` Siqi Lin
2017-10-26 21:23         ` Nick Desaulniers
2017-10-26 21:41           ` Nick Desaulniers
2017-10-26 21:51           ` Siqi Lin
2017-10-26 21:43     ` [PATCH v2] " Nick Desaulniers
2017-10-27  9:41       ` Ard Biesheuvel
2017-10-27 16:33         ` [PATCH v3] " Nick Desaulniers
2017-10-30 13:08           ` Will Deacon
2017-10-30 13:11             ` Ard Biesheuvel
2017-10-30 13:12               ` Will Deacon
2017-10-30 13:35                 ` Ard Biesheuvel
2017-10-30 14:08                   ` Will Deacon
2017-10-30 14:11                     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).