All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12  8:19 ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-12  8:19 UTC (permalink / raw)
  To: linux-arm-kernel, Russell King
  Cc: Nicholas Piggin, linux-kbuild, Arnd Bergmann, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

Erratum 657417 is worked around by the linker by inserting additional
branch trampolines to avoid problematic branch target locations. This
results in much higher linking time and presumably slower and larger
generated code. The workaround also seems to only be required when
linking thumb2 code, but the linker applies it for non-thumb2 code as
well.

The workaround today is left to the linker to apply, which is overly
conservative.

https://sourceware.org/ml/binutils/2009-05/msg00297.html

This patch adds an option which defaults to "y" in cases where we
could possibly be running Cortex A8 and using Thumb2 instructions.
In reality the workaround might not be required at all for the kernel
if virtual instruction memory is linear in physical memory. However it
is more conservative to keep the workaround, and it may be the case
that the TLB lookup would be required in order to catch branches to
unmapped or no-execute pages.

In an allyesconfig build, this workaround causes a large load on
the linker's branch stub hash and slows down the final link by a
factor of 5.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/arm/Kconfig  | 14 ++++++++++++++
 arch/arm/Makefile |  7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 90542db..3c7dde1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1033,6 +1033,20 @@ config ARM_ERRATA_460075
 	  ACTLR register. Note that setting specific bits in the ACTLR register
 	  may not be available in non-secure mode.
 
+config ARM_ERRATA_657417
+	bool "ARM errata: A 32-bit branch instruction that spans two 4K regions can result in an incorrect operation"
+	depends on CPU_V7
+	depends on THUMB2_KERNEL
+	default y
+	help
+	  This option enables the workaround for the 657417 Cortex-A8 erratum.
+	  If, while executing code in Thumb or ThumbEE state, a 32-bit Thumb-2
+	  branch instruction is executed that spans two 4KB regions, and the
+	  target address of the branch falls within the first region, it is
+	  possible for the processor to behave incorrectly. This workaround
+	  enables a linker workaround that adds branch trampolines that bounce
+	  offending branches via a safe location.
+
 config ARM_ERRATA_742230
 	bool "ARM errata: DMB operation may be faulty"
 	depends on CPU_V7 && SMP
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 274e8a6..b49a2e0 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -43,6 +43,13 @@ ifeq ($(CONFIG_FRAME_POINTER),y)
 KBUILD_CFLAGS	+=-fno-omit-frame-pointer -mapcs -mno-sched-prolog
 endif
 
+ifneq ($(CONFIG_ARM_ERRATA_657417),y)
+# ld-option has to run before we override LD otherwise it fails on
+# cross compile with mismatched endian (C compiler outputs one endian,
+# LD accepts another)
+LDFLAGS		+=$(call ld-option, --no-fix-cortex-a8,)
+endif
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
 KBUILD_CPPFLAGS	+= -mbig-endian
 AS		+= -EB
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12  8:19 ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-12  8:19 UTC (permalink / raw)
  To: linux-arm-kernel

Erratum 657417 is worked around by the linker by inserting additional
branch trampolines to avoid problematic branch target locations. This
results in much higher linking time and presumably slower and larger
generated code. The workaround also seems to only be required when
linking thumb2 code, but the linker applies it for non-thumb2 code as
well.

The workaround today is left to the linker to apply, which is overly
conservative.

https://sourceware.org/ml/binutils/2009-05/msg00297.html

This patch adds an option which defaults to "y" in cases where we
could possibly be running Cortex A8 and using Thumb2 instructions.
In reality the workaround might not be required at all for the kernel
if virtual instruction memory is linear in physical memory. However it
is more conservative to keep the workaround, and it may be the case
that the TLB lookup would be required in order to catch branches to
unmapped or no-execute pages.

In an allyesconfig build, this workaround causes a large load on
the linker's branch stub hash and slows down the final link by a
factor of 5.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/arm/Kconfig  | 14 ++++++++++++++
 arch/arm/Makefile |  7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 90542db..3c7dde1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1033,6 +1033,20 @@ config ARM_ERRATA_460075
 	  ACTLR register. Note that setting specific bits in the ACTLR register
 	  may not be available in non-secure mode.
 
+config ARM_ERRATA_657417
+	bool "ARM errata: A 32-bit branch instruction that spans two 4K regions can result in an incorrect operation"
+	depends on CPU_V7
+	depends on THUMB2_KERNEL
+	default y
+	help
+	  This option enables the workaround for the 657417 Cortex-A8 erratum.
+	  If, while executing code in Thumb or ThumbEE state, a 32-bit Thumb-2
+	  branch instruction is executed that spans two 4KB regions, and the
+	  target address of the branch falls within the first region, it is
+	  possible for the processor to behave incorrectly. This workaround
+	  enables a linker workaround that adds branch trampolines that bounce
+	  offending branches via a safe location.
+
 config ARM_ERRATA_742230
 	bool "ARM errata: DMB operation may be faulty"
 	depends on CPU_V7 && SMP
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 274e8a6..b49a2e0 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -43,6 +43,13 @@ ifeq ($(CONFIG_FRAME_POINTER),y)
 KBUILD_CFLAGS	+=-fno-omit-frame-pointer -mapcs -mno-sched-prolog
 endif
 
+ifneq ($(CONFIG_ARM_ERRATA_657417),y)
+# ld-option has to run before we override LD otherwise it fails on
+# cross compile with mismatched endian (C compiler outputs one endian,
+# LD accepts another)
+LDFLAGS		+=$(call ld-option, --no-fix-cortex-a8,)
+endif
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
 KBUILD_CPPFLAGS	+= -mbig-endian
 AS		+= -EB
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12  8:19 ` Nicholas Piggin
@ 2016-08-12 12:33   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 12:33 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arm-kernel, linux-kbuild, Arnd Bergmann, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> This patch adds an option which defaults to "y" in cases where we
> could possibly be running Cortex A8 and using Thumb2 instructions.
> In reality the workaround might not be required at all for the kernel
> if virtual instruction memory is linear in physical memory.

Hmm.

The main kernel image is guaranteed to be contiguous in physical memory
for all sorts of reasons, so this really isn't a concern for the kernel
itself.

Modules, however, are a different matter, as they are mapped in using
individual pages, and are most likely to be non-contiguous in physical
memory.  The kernel's module linker knows nothing about this errata,
so it'll generally just fix up the relocations in the most basic of
ways.

So, I think we should always use this --no-fix-cortex-a8 option where
the linker supports it irrespective of whether we're running on a core
needing this workaround, but we probably need to fix the kernel module
linker to know about this.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 12:33   ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> This patch adds an option which defaults to "y" in cases where we
> could possibly be running Cortex A8 and using Thumb2 instructions.
> In reality the workaround might not be required at all for the kernel
> if virtual instruction memory is linear in physical memory.

Hmm.

The main kernel image is guaranteed to be contiguous in physical memory
for all sorts of reasons, so this really isn't a concern for the kernel
itself.

Modules, however, are a different matter, as they are mapped in using
individual pages, and are most likely to be non-contiguous in physical
memory.  The kernel's module linker knows nothing about this errata,
so it'll generally just fix up the relocations in the most basic of
ways.

So, I think we should always use this --no-fix-cortex-a8 option where
the linker supports it irrespective of whether we're running on a core
needing this workaround, but we probably need to fix the kernel module
linker to know about this.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 12:33   ` Russell King - ARM Linux
@ 2016-08-12 13:15     ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-12 13:15 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, linux-kbuild, Arnd Bergmann, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Fri, 12 Aug 2016 13:33:14 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> > This patch adds an option which defaults to "y" in cases where we
> > could possibly be running Cortex A8 and using Thumb2 instructions.
> > In reality the workaround might not be required at all for the kernel
> > if virtual instruction memory is linear in physical memory.  
> 
> Hmm.
> 
> The main kernel image is guaranteed to be contiguous in physical memory
> for all sorts of reasons, so this really isn't a concern for the kernel
> itself.

That's what it *seems* like. I wanted to be conservative because I don't
know the architecture nor have actually looked at the errata docs. You
can probably make stronger guarantees to avoid it. Perhaps enabling just
for modules would be workable.


> Modules, however, are a different matter, as they are mapped in using
> individual pages, and are most likely to be non-contiguous in physical
> memory.  The kernel's module linker knows nothing about this errata,
> so it'll generally just fix up the relocations in the most basic of
> ways.
> 
> So, I think we should always use this --no-fix-cortex-a8 option where
> the linker supports it irrespective of whether we're running on a core
> needing this workaround, but we probably need to fix the kernel module
> linker to know about this.

It looks like it would be a bit of work to go that route. The linker of
course would not give you relocations or stubs for the branches you
need them.

Anyway do what you think best with the patch. It seems to eliminate the
link time regression on ARM allyesconfig when using thin archives for
building which is the main thing I was concerned about.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:15     ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-12 13:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 12 Aug 2016 13:33:14 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> > This patch adds an option which defaults to "y" in cases where we
> > could possibly be running Cortex A8 and using Thumb2 instructions.
> > In reality the workaround might not be required at all for the kernel
> > if virtual instruction memory is linear in physical memory.  
> 
> Hmm.
> 
> The main kernel image is guaranteed to be contiguous in physical memory
> for all sorts of reasons, so this really isn't a concern for the kernel
> itself.

That's what it *seems* like. I wanted to be conservative because I don't
know the architecture nor have actually looked at the errata docs. You
can probably make stronger guarantees to avoid it. Perhaps enabling just
for modules would be workable.


> Modules, however, are a different matter, as they are mapped in using
> individual pages, and are most likely to be non-contiguous in physical
> memory.  The kernel's module linker knows nothing about this errata,
> so it'll generally just fix up the relocations in the most basic of
> ways.
> 
> So, I think we should always use this --no-fix-cortex-a8 option where
> the linker supports it irrespective of whether we're running on a core
> needing this workaround, but we probably need to fix the kernel module
> linker to know about this.

It looks like it would be a bit of work to go that route. The linker of
course would not give you relocations or stubs for the branches you
need them.

Anyway do what you think best with the patch. It seems to eliminate the
link time regression on ARM allyesconfig when using thin archives for
building which is the main thing I was concerned about.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 13:15     ` Nicholas Piggin
@ 2016-08-12 13:49       ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:49 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Russell King - ARM Linux, Nicolas Pitre, linux-arch,
	Segher Boessenkool, Arnd Bergmann, Alan Modra, linux-kbuild,
	Julian Brown, Michal Marek, Stephen Rothwell, Sam Ravnborg,
	linux-arm-kernel

On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Fri, 12 Aug 2016 13:33:14 +0100
> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
>
>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>> > This patch adds an option which defaults to "y" in cases where we
>> > could possibly be running Cortex A8 and using Thumb2 instructions.
>> > In reality the workaround might not be required at all for the kernel
>> > if virtual instruction memory is linear in physical memory.
>>
>> Hmm.
>>
>> The main kernel image is guaranteed to be contiguous in physical memory
>> for all sorts of reasons, so this really isn't a concern for the kernel
>> itself.
>
> That's what it *seems* like. I wanted to be conservative because I don't
> know the architecture nor have actually looked at the errata docs. You
> can probably make stronger guarantees to avoid it. Perhaps enabling just
> for modules would be workable.
>
>
>> Modules, however, are a different matter, as they are mapped in using
>> individual pages, and are most likely to be non-contiguous in physical
>> memory.  The kernel's module linker knows nothing about this errata,
>> so it'll generally just fix up the relocations in the most basic of
>> ways.
>>
>> So, I think we should always use this --no-fix-cortex-a8 option where
>> the linker supports it irrespective of whether we're running on a core
>> needing this workaround, but we probably need to fix the kernel module
>> linker to know about this.
>
> It looks like it would be a bit of work to go that route. The linker of
> course would not give you relocations or stubs for the branches you
> need them.
>

We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
branch via a PLT entry if an affected instruction is encountered.
However, this only covers branch instructions that are covered by
relocations, so we'd still need to scan the module .text to look for
affected instructions whose targets has been resolved at compile time.

Running this

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:49       ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Fri, 12 Aug 2016 13:33:14 +0100
> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
>
>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>> > This patch adds an option which defaults to "y" in cases where we
>> > could possibly be running Cortex A8 and using Thumb2 instructions.
>> > In reality the workaround might not be required at all for the kernel
>> > if virtual instruction memory is linear in physical memory.
>>
>> Hmm.
>>
>> The main kernel image is guaranteed to be contiguous in physical memory
>> for all sorts of reasons, so this really isn't a concern for the kernel
>> itself.
>
> That's what it *seems* like. I wanted to be conservative because I don't
> know the architecture nor have actually looked at the errata docs. You
> can probably make stronger guarantees to avoid it. Perhaps enabling just
> for modules would be workable.
>
>
>> Modules, however, are a different matter, as they are mapped in using
>> individual pages, and are most likely to be non-contiguous in physical
>> memory.  The kernel's module linker knows nothing about this errata,
>> so it'll generally just fix up the relocations in the most basic of
>> ways.
>>
>> So, I think we should always use this --no-fix-cortex-a8 option where
>> the linker supports it irrespective of whether we're running on a core
>> needing this workaround, but we probably need to fix the kernel module
>> linker to know about this.
>
> It looks like it would be a bit of work to go that route. The linker of
> course would not give you relocations or stubs for the branches you
> need them.
>

We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
branch via a PLT entry if an affected instruction is encountered.
However, this only covers branch instructions that are covered by
relocations, so we'd still need to scan the module .text to look for
affected instructions whose targets has been resolved at compile time.

Running this

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 13:49       ` Ard Biesheuvel
  (?)
@ 2016-08-12 13:50         ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:50 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Russell King - ARM Linux, Nicolas Pitre, linux-arch,
	Segher Boessenkool, Arnd Bergmann, Alan Modra, linux-kbuild,
	Julian Brown, Michal Marek, Stephen Rothwell, Sam Ravnborg,
	linux-arm-kernel

On 12 August 2016 at 15:49, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote:
>> On Fri, 12 Aug 2016 13:33:14 +0100
>> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
>>
>>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>>> > This patch adds an option which defaults to "y" in cases where we
>>> > could possibly be running Cortex A8 and using Thumb2 instructions.
>>> > In reality the workaround might not be required at all for the kernel
>>> > if virtual instruction memory is linear in physical memory.
>>>
>>> Hmm.
>>>
>>> The main kernel image is guaranteed to be contiguous in physical memory
>>> for all sorts of reasons, so this really isn't a concern for the kernel
>>> itself.
>>
>> That's what it *seems* like. I wanted to be conservative because I don't
>> know the architecture nor have actually looked at the errata docs. You
>> can probably make stronger guarantees to avoid it. Perhaps enabling just
>> for modules would be workable.
>>
>>
>>> Modules, however, are a different matter, as they are mapped in using
>>> individual pages, and are most likely to be non-contiguous in physical
>>> memory.  The kernel's module linker knows nothing about this errata,
>>> so it'll generally just fix up the relocations in the most basic of
>>> ways.
>>>
>>> So, I think we should always use this --no-fix-cortex-a8 option where
>>> the linker supports it irrespective of whether we're running on a core
>>> needing this workaround, but we probably need to fix the kernel module
>>> linker to know about this.
>>
>> It looks like it would be a bit of work to go that route. The linker of
>> course would not give you relocations or stubs for the branches you
>> need them.
>>
>
> We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
> branch via a PLT entry if an affected instruction is encountered.
> However, this only covers branch instructions that are covered by
> relocations, so we'd still need to scan the module .text to look for
> affected instructions whose targets has been resolved at compile time.
>
> Running this

$ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less

I get numerous instances of b.w that are not covered by any
relocations, so i assume that will be the case for modules as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:50         ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:50 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Arnd Bergmann,
	Alan Modra, linux-kbuild, Russell King - ARM Linux, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On 12 August 2016 at 15:49, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote:
>> On Fri, 12 Aug 2016 13:33:14 +0100
>> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
>>
>>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>>> > This patch adds an option which defaults to "y" in cases where we
>>> > could possibly be running Cortex A8 and using Thumb2 instructions.
>>> > In reality the workaround might not be required at all for the kernel
>>> > if virtual instruction memory is linear in physical memory.
>>>
>>> Hmm.
>>>
>>> The main kernel image is guaranteed to be contiguous in physical memory
>>> for all sorts of reasons, so this really isn't a concern for the kernel
>>> itself.
>>
>> That's what it *seems* like. I wanted to be conservative because I don't
>> know the architecture nor have actually looked at the errata docs. You
>> can probably make stronger guarantees to avoid it. Perhaps enabling just
>> for modules would be workable.
>>
>>
>>> Modules, however, are a different matter, as they are mapped in using
>>> individual pages, and are most likely to be non-contiguous in physical
>>> memory.  The kernel's module linker knows nothing about this errata,
>>> so it'll generally just fix up the relocations in the most basic of
>>> ways.
>>>
>>> So, I think we should always use this --no-fix-cortex-a8 option where
>>> the linker supports it irrespective of whether we're running on a core
>>> needing this workaround, but we probably need to fix the kernel module
>>> linker to know about this.
>>
>> It looks like it would be a bit of work to go that route. The linker of
>> course would not give you relocations or stubs for the branches you
>> need them.
>>
>
> We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
> branch via a PLT entry if an affected instruction is encountered.
> However, this only covers branch instructions that are covered by
> relocations, so we'd still need to scan the module .text to look for
> affected instructions whose targets has been resolved at compile time.
>
> Running this

$ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less

I get numerous instances of b.w that are not covered by any
relocations, so i assume that will be the case for modules as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:50         ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 August 2016 at 15:49, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote:
>> On Fri, 12 Aug 2016 13:33:14 +0100
>> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
>>
>>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>>> > This patch adds an option which defaults to "y" in cases where we
>>> > could possibly be running Cortex A8 and using Thumb2 instructions.
>>> > In reality the workaround might not be required at all for the kernel
>>> > if virtual instruction memory is linear in physical memory.
>>>
>>> Hmm.
>>>
>>> The main kernel image is guaranteed to be contiguous in physical memory
>>> for all sorts of reasons, so this really isn't a concern for the kernel
>>> itself.
>>
>> That's what it *seems* like. I wanted to be conservative because I don't
>> know the architecture nor have actually looked at the errata docs. You
>> can probably make stronger guarantees to avoid it. Perhaps enabling just
>> for modules would be workable.
>>
>>
>>> Modules, however, are a different matter, as they are mapped in using
>>> individual pages, and are most likely to be non-contiguous in physical
>>> memory.  The kernel's module linker knows nothing about this errata,
>>> so it'll generally just fix up the relocations in the most basic of
>>> ways.
>>>
>>> So, I think we should always use this --no-fix-cortex-a8 option where
>>> the linker supports it irrespective of whether we're running on a core
>>> needing this workaround, but we probably need to fix the kernel module
>>> linker to know about this.
>>
>> It looks like it would be a bit of work to go that route. The linker of
>> course would not give you relocations or stubs for the branches you
>> need them.
>>
>
> We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
> branch via a PLT entry if an affected instruction is encountered.
> However, this only covers branch instructions that are covered by
> relocations, so we'd still need to scan the module .text to look for
> affected instructions whose targets has been resolved at compile time.
>
> Running this

$ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less

I get numerous instances of b.w that are not covered by any
relocations, so i assume that will be the case for modules as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 13:49       ` Ard Biesheuvel
@ 2016-08-12 13:51         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 13:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicholas Piggin, Nicolas Pitre, linux-arch, Segher Boessenkool,
	Arnd Bergmann, Alan Modra, linux-kbuild, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Fri, Aug 12, 2016 at 03:49:15PM +0200, Ard Biesheuvel wrote:
> We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
> branch via a PLT entry if an affected instruction is encountered.
> However, this only covers branch instructions that are covered by
> relocations, so we'd still need to scan the module .text to look for
> affected instructions whose targets has been resolved at compile time.

Only if it's combined with detecting which regions of the .text section
are really instructions - just looking for something that appears to be
a branch instruction will be unreliable as the .text contains literal
data, and literal data could look like a branch instruction.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:51         ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2016 at 03:49:15PM +0200, Ard Biesheuvel wrote:
> We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a
> branch via a PLT entry if an affected instruction is encountered.
> However, this only covers branch instructions that are covered by
> relocations, so we'd still need to scan the module .text to look for
> affected instructions whose targets has been resolved at compile time.

Only if it's combined with detecting which regions of the .text section
are really instructions - just looking for something that appears to be
a branch instruction will be unreliable as the .text contains literal
data, and literal data could look like a branch instruction.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 13:50         ` Ard Biesheuvel
@ 2016-08-12 13:52           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 13:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicholas Piggin, Nicolas Pitre, linux-arch, Segher Boessenkool,
	Arnd Bergmann, Alan Modra, linux-kbuild, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote:
> $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less
> 
> I get numerous instances of b.w that are not covered by any
> relocations, so i assume that will be the case for modules as well.

Not surprising.  vmlinux is fully linked.  There's no relocations.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:52           ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote:
> $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less
> 
> I get numerous instances of b.w that are not covered by any
> relocations, so i assume that will be the case for modules as well.

Not surprising.  vmlinux is fully linked.  There's no relocations.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 13:52           ` Russell King - ARM Linux
@ 2016-08-12 13:54             ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:54 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nicholas Piggin, Nicolas Pitre, linux-arch, Segher Boessenkool,
	Arnd Bergmann, Alan Modra, linux-kbuild, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On 12 August 2016 at 15:52, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote:
>> $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less
>>
>> I get numerous instances of b.w that are not covered by any
>> relocations, so i assume that will be the case for modules as well.
>
> Not surprising.  vmlinux is fully linked.  There's no relocations.
>

My bad. It does if you link it with --emit-relocs.

Random snippet:

c036b098:       f7ff bff8       b.w     c036b08c <rcu_barrier>
                        c036b098: R_ARM_THM_JUMP24      rcu_barrier
--
c036b5ae:       f7ff ba5b       b.w     c036aa68 <rcu_gp_kthread_wake>
c036b5b2:       bf00            nop
--
c036bc28:       f1c6 b6da       b.w     c09329e0 <_raw_spin_unlock_irqrestore>
                        c036bc28: R_ARM_THM_JUMP24
_raw_spin_unlock_irqrestore
--
c036bcbc:       f7ee bc2e       b.w     c035a51c <swake_up>
                        c036bcbc: R_ARM_THM_JUMP24      swake_up
--
c036c63a:       f7ff bfd5       b.w     c036c5e8 <synchronize_sched>
                        c036c63a: R_ARM_THM_JUMP24      synchronize_sched
--
c036c640:       f7ff bff0       b.w     c036c624 <cond_synchronize_sched>
                        c036c640: R_ARM_THM_JUMP24      cond_synchronize_sched
--
c036c644:       f7ff bb3e       b.w     c036bcc4 <synchronize_sched_expedited>
                        c036c644: R_ARM_THM_JUMP24
synchronize_sched_expedited
--
c036c690:       f7dc ba0a       b.w     c0348aa8 <resched_cpu>
                        c036c690: R_ARM_THM_JUMP24      resched_cpu
--
c036c6a2:       f7ff baad       b.w     c036bc00
<rcu_report_exp_cpu_mult.constprop.22>
c036c6a6:       bf00            nop
--
c036c716:       f7ff ba73       b.w     c036bc00
<rcu_report_exp_cpu_mult.constprop.22>
c036c71a:       bf00            nop
--
c036cf3a:       f7fd bc4d       b.w     c036a7d8 <rcu_eqs_enter_common>
c036cf3e:       bf00            nop

So some branches are relocated, some have already been resolved at compile time.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 13:54             ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2016-08-12 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 August 2016 at 15:52, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote:
>> $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less
>>
>> I get numerous instances of b.w that are not covered by any
>> relocations, so i assume that will be the case for modules as well.
>
> Not surprising.  vmlinux is fully linked.  There's no relocations.
>

My bad. It does if you link it with --emit-relocs.

Random snippet:

c036b098:       f7ff bff8       b.w     c036b08c <rcu_barrier>
                        c036b098: R_ARM_THM_JUMP24      rcu_barrier
--
c036b5ae:       f7ff ba5b       b.w     c036aa68 <rcu_gp_kthread_wake>
c036b5b2:       bf00            nop
--
c036bc28:       f1c6 b6da       b.w     c09329e0 <_raw_spin_unlock_irqrestore>
                        c036bc28: R_ARM_THM_JUMP24
_raw_spin_unlock_irqrestore
--
c036bcbc:       f7ee bc2e       b.w     c035a51c <swake_up>
                        c036bcbc: R_ARM_THM_JUMP24      swake_up
--
c036c63a:       f7ff bfd5       b.w     c036c5e8 <synchronize_sched>
                        c036c63a: R_ARM_THM_JUMP24      synchronize_sched
--
c036c640:       f7ff bff0       b.w     c036c624 <cond_synchronize_sched>
                        c036c640: R_ARM_THM_JUMP24      cond_synchronize_sched
--
c036c644:       f7ff bb3e       b.w     c036bcc4 <synchronize_sched_expedited>
                        c036c644: R_ARM_THM_JUMP24
synchronize_sched_expedited
--
c036c690:       f7dc ba0a       b.w     c0348aa8 <resched_cpu>
                        c036c690: R_ARM_THM_JUMP24      resched_cpu
--
c036c6a2:       f7ff baad       b.w     c036bc00
<rcu_report_exp_cpu_mult.constprop.22>
c036c6a6:       bf00            nop
--
c036c716:       f7ff ba73       b.w     c036bc00
<rcu_report_exp_cpu_mult.constprop.22>
c036c71a:       bf00            nop
--
c036cf3a:       f7fd bc4d       b.w     c036a7d8 <rcu_eqs_enter_common>
c036cf3e:       bf00            nop

So some branches are relocated, some have already been resolved at compile time.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 12:33   ` Russell King - ARM Linux
@ 2016-08-12 14:17     ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2016-08-12 14:17 UTC (permalink / raw)
  To: Russell King - ARM Linux, Nicholas Piggin
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Arnd Bergmann,
	Alan Modra, linux-kbuild, Julian Brown, Michal Marek,
	Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On 12/08/16 13:33, Russell King - ARM Linux wrote:
> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>> This patch adds an option which defaults to "y" in cases where we
>> could possibly be running Cortex A8 and using Thumb2 instructions.
>> In reality the workaround might not be required at all for the kernel
>> if virtual instruction memory is linear in physical memory.
> 
> Hmm.
> 
> The main kernel image is guaranteed to be contiguous in physical memory
> for all sorts of reasons, so this really isn't a concern for the kernel
> itself.

I'm not sure being contiguous matters much - looking at the errata doc,
the implication is that the branch is supposed to use bits 31:12 of the
address of the first page, but under the erratum conditions ends up
taking bits 31:12 of the address of the _second_ page instead. There
doesn't seem to be any importance of where those pages actually are
relative to each other.

> Modules, however, are a different matter, as they are mapped in using
> individual pages, and are most likely to be non-contiguous in physical
> memory.  The kernel's module linker knows nothing about this errata,
> so it'll generally just fix up the relocations in the most basic of
> ways.
> 
> So, I think we should always use this --no-fix-cortex-a8 option where
> the linker supports it irrespective of whether we're running on a core
> needing this workaround, but we probably need to fix the kernel module
> linker to know about this.

Given the above, I'm not convinced that sounds safe, but then I can't
claim to have fist-hand experience with this bug either.

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 14:17     ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2016-08-12 14:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/08/16 13:33, Russell King - ARM Linux wrote:
> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
>> This patch adds an option which defaults to "y" in cases where we
>> could possibly be running Cortex A8 and using Thumb2 instructions.
>> In reality the workaround might not be required at all for the kernel
>> if virtual instruction memory is linear in physical memory.
> 
> Hmm.
> 
> The main kernel image is guaranteed to be contiguous in physical memory
> for all sorts of reasons, so this really isn't a concern for the kernel
> itself.

I'm not sure being contiguous matters much - looking at the errata doc,
the implication is that the branch is supposed to use bits 31:12 of the
address of the first page, but under the erratum conditions ends up
taking bits 31:12 of the address of the _second_ page instead. There
doesn't seem to be any importance of where those pages actually are
relative to each other.

> Modules, however, are a different matter, as they are mapped in using
> individual pages, and are most likely to be non-contiguous in physical
> memory.  The kernel's module linker knows nothing about this errata,
> so it'll generally just fix up the relocations in the most basic of
> ways.
> 
> So, I think we should always use this --no-fix-cortex-a8 option where
> the linker supports it irrespective of whether we're running on a core
> needing this workaround, but we probably need to fix the kernel module
> linker to know about this.

Given the above, I'm not convinced that sounds safe, but then I can't
claim to have fist-hand experience with this bug either.

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 14:17     ` Robin Murphy
@ 2016-08-12 14:23       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 14:23 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicholas Piggin, Nicolas Pitre, linux-arch, Segher Boessenkool,
	Arnd Bergmann, Alan Modra, linux-kbuild, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Fri, Aug 12, 2016 at 03:17:06PM +0100, Robin Murphy wrote:
> On 12/08/16 13:33, Russell King - ARM Linux wrote:
> > On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> >> This patch adds an option which defaults to "y" in cases where we
> >> could possibly be running Cortex A8 and using Thumb2 instructions.
> >> In reality the workaround might not be required at all for the kernel
> >> if virtual instruction memory is linear in physical memory.
> > 
> > Hmm.
> > 
> > The main kernel image is guaranteed to be contiguous in physical memory
> > for all sorts of reasons, so this really isn't a concern for the kernel
> > itself.
> 
> I'm not sure being contiguous matters much - looking at the errata doc,
> the implication is that the branch is supposed to use bits 31:12 of the
> address of the first page, but under the erratum conditions ends up
> taking bits 31:12 of the address of the _second_ page instead. There
> doesn't seem to be any importance of where those pages actually are
> relative to each other.

I've not actually looked at the errata document - I need to jump through
all sorts of stupid hoops to get it through the ARM website.  Ever since
I requested a change of my email address, it now wants all sorts of
personal information that I'm refusing to type in again.  I've no idea
why ARM Ltd wiped out all that information just because I asked for my
email address to be changed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 14:23       ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2016 at 03:17:06PM +0100, Robin Murphy wrote:
> On 12/08/16 13:33, Russell King - ARM Linux wrote:
> > On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote:
> >> This patch adds an option which defaults to "y" in cases where we
> >> could possibly be running Cortex A8 and using Thumb2 instructions.
> >> In reality the workaround might not be required at all for the kernel
> >> if virtual instruction memory is linear in physical memory.
> > 
> > Hmm.
> > 
> > The main kernel image is guaranteed to be contiguous in physical memory
> > for all sorts of reasons, so this really isn't a concern for the kernel
> > itself.
> 
> I'm not sure being contiguous matters much - looking at the errata doc,
> the implication is that the branch is supposed to use bits 31:12 of the
> address of the first page, but under the erratum conditions ends up
> taking bits 31:12 of the address of the _second_ page instead. There
> doesn't seem to be any importance of where those pages actually are
> relative to each other.

I've not actually looked at the errata document - I need to jump through
all sorts of stupid hoops to get it through the ARM website.  Ever since
I requested a change of my email address, it now wants all sorts of
personal information that I'm refusing to type in again.  I've no idea
why ARM Ltd wiped out all that information just because I asked for my
email address to be changed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12 14:23       ` Russell King - ARM Linux
  (?)
@ 2016-08-12 14:32         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 14:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Arnd Bergmann,
	Alan Modra, linux-kbuild, Nicholas Piggin, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Fri, Aug 12, 2016 at 03:23:25PM +0100, Russell King - ARM Linux wrote:
> I've not actually looked at the errata document - I need to jump through
> all sorts of stupid hoops to get it through the ARM website.  Ever since
> I requested a change of my email address, it now wants all sorts of
> personal information that I'm refusing to type in again.  I've no idea
> why ARM Ltd wiped out all that information just because I asked for my
> email address to be changed.

... oh, and it's forgotten that I'm supposed to be able to access
that information too - it wants me to apply for approval to access
errata documents.

Maybe someone can send the appropriate document(s) my way instead.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 14:32         ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 14:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Arnd Bergmann,
	linux-kbuild, Alan Modra, Nicholas Piggin, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Fri, Aug 12, 2016 at 03:23:25PM +0100, Russell King - ARM Linux wrote:
> I've not actually looked at the errata document - I need to jump through
> all sorts of stupid hoops to get it through the ARM website.  Ever since
> I requested a change of my email address, it now wants all sorts of
> personal information that I'm refusing to type in again.  I've no idea
> why ARM Ltd wiped out all that information just because I asked for my
> email address to be changed.

... oh, and it's forgotten that I'm supposed to be able to access
that information too - it wants me to apply for approval to access
errata documents.

Maybe someone can send the appropriate document(s) my way instead.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-12 14:32         ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2016-08-12 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2016 at 03:23:25PM +0100, Russell King - ARM Linux wrote:
> I've not actually looked at the errata document - I need to jump through
> all sorts of stupid hoops to get it through the ARM website.  Ever since
> I requested a change of my email address, it now wants all sorts of
> personal information that I'm refusing to type in again.  I've no idea
> why ARM Ltd wiped out all that information just because I asked for my
> email address to be changed.

... oh, and it's forgotten that I'm supposed to be able to access
that information too - it wants me to apply for approval to access
errata documents.

Maybe someone can send the appropriate document(s) my way instead.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-12  8:19 ` Nicholas Piggin
@ 2016-08-23 12:01   ` Arnd Bergmann
  -1 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-23 12:01 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> Erratum 657417 is worked around by the linker by inserting additional
> branch trampolines to avoid problematic branch target locations. This
> results in much higher linking time and presumably slower and larger
> generated code. The workaround also seems to only be required when
> linking thumb2 code, but the linker applies it for non-thumb2 code as
> well.
> 
> The workaround today is left to the linker to apply, which is overly
> conservative.
> 
> https://sourceware.org/ml/binutils/2009-05/msg00297.html
> 
> This patch adds an option which defaults to "y" in cases where we
> could possibly be running Cortex A8 and using Thumb2 instructions.
> In reality the workaround might not be required at all for the kernel
> if virtual instruction memory is linear in physical memory. However it
> is more conservative to keep the workaround, and it may be the case
> that the TLB lookup would be required in order to catch branches to
> unmapped or no-execute pages.
> 
> In an allyesconfig build, this workaround causes a large load on
> the linker's branch stub hash and slows down the final link by a
> factor of 5.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 

Thanks a lot for finding this issue. I can confirm that your patch
helps noticeably in all configurations, reducing time for a relink
from 18 minutes to 9 minutes on my machine in the best case, but
the factor 10 slowdown of the final link with your thin archives
and gc-sections patches remains.

I suspect there is still something else going on besides the 657417
slowing things down, but it's also possible that I'm doing something
wrong here.

Aside from that, I notice that for the purpose of speeding up
"allyesconfig", we don't actually need to make this user
configurable, it's sufficient to disable the workaround when
CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig
and all the defconfig files (but not randconfig) use. I also found
that using THUMB2_KERNEL itself causes a 50% slowdown. I have
patches on my "randconfig" test tree that have the side-effect
of enabling THUMB2_KERNEL for allyesconfig, which is one reason
I have been getting worse results than others.

I could also try to revive an older patch I started, to annotate
the specific CPU core on each ARMv7 platform. I think I have all the
information we need for that, and there are other advantages in
doing it: we could be more selective with all the ARMv7 errata,
and automatically determine whether some optional CPU features
(LPAE, virtualization, integer divide) are available on all
of the selected CPU cores.

	Arnd
---
Full link timing results follow

|| THUMB2, thin archive + gc-sections, before: 18 minutes

09:56:47   LINK    vmlinux
09:56:47  AR      built-in.o
09:56:49  LD      vmlinux.o
10:04:27   MODPOST vmlinux.o
10:04:29  GEN     .version
10:04:29   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:04:29   CC      init/version.o
10:04:29   AR      init/built-in.o
10:07:39  KSYM    .tmp_kallsyms1.o
10:11:05  KSYM    .tmp_kallsyms2.o
10:11:16  LD      vmlinux
10:14:30  SORTEX  vmlinux
10:14:30  SYSMAP  System.map

|| THUMB2, thin archive + gc-sections, after: 9 minutes

10:16:01   CHK     include/generated/uapi/linux/version.h
10:16:02   LINK    vmlinux
10:16:02  AR      built-in.o
10:16:03  LD      vmlinux.o
10:23:43   MODPOST vmlinux.o
10:23:46  GEN     .version
10:23:46   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:23:46   CC      init/version.o
10:23:47   AR      init/built-in.o
10:24:04  KSYM    .tmp_kallsyms1.o
10:24:32  KSYM    .tmp_kallsyms2.o
10:24:45  LD      vmlinux
10:25:00  SORTEX  vmlinux
10:25:00  SYSMAP  System.map

|| THUMB2, no thin archive + gc-sections, before: 93 seconds

10:44:35   CHK     include/generated/uapi/linux/version.h
10:44:35   LINK    vmlinux
10:44:35  LD      vmlinux.o
10:44:39   MODPOST vmlinux.o
10:44:41  GEN     .version
10:44:41   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:44:41   CC      init/version.o
10:44:41   LD      init/built-in.o
10:45:02  KSYM    .tmp_kallsyms1.o
10:45:35  KSYM    .tmp_kallsyms2.o
10:45:47  LD      vmlinux
10:46:06  SORTEX  vmlinux
10:46:06  SYSMAP  System.map
10:46:08   OBJCOPY arch/arm/boot/Image

|| THUMB2, no thin archive + gc-sections, after: 52 seconds

10:41:46   LINK    vmlinux
10:41:46  LD      vmlinux.o
10:41:49   MODPOST vmlinux.o
10:41:52  GEN     .version
10:41:52   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:41:52   CC      init/version.o
10:41:52   LD      init/built-in.o
10:41:58  KSYM    .tmp_kallsyms1.o
10:42:17  KSYM    .tmp_kallsyms2.o
10:42:31  LD      vmlinux
10:42:36  SORTEX  vmlinux
10:42:36  SYSMAP  System.map
10:42:38   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, no thin archives + gc-sections, before: 59 seconds

11:25:05   LINK    vmlinux
11:25:05  LD      vmlinux.o
11:25:07   MODPOST vmlinux.o
11:25:10  GEN     .version
11:25:10   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
11:25:10   CC      init/version.o
11:25:10   LD      init/built-in.o
11:25:19  KSYM    .tmp_kallsyms1.o
11:25:41  KSYM    .tmp_kallsyms2.o
11:25:53  LD      vmlinux
11:26:03  SORTEX  vmlinux
11:26:03  SYSMAP  System.map
  Building modules, stage 2.

|| THUMB2_KERNEL disabled, no thin archives + gc-sections, after: 46 seconds

11:27:36   LINK    vmlinux
11:27:36  LD      vmlinux.o
11:27:39   MODPOST vmlinux.o
11:27:41  GEN     .version
11:27:41   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
11:27:41   CC      init/version.o
11:27:41   LD      init/built-in.o
11:27:46  KSYM    .tmp_kallsyms1.o
11:28:04  KSYM    .tmp_kallsyms2.o
11:28:15  LD      vmlinux
11:28:20  SORTEX  vmlinux
11:28:20  SYSMAP  System.map
11:28:22   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, thin archives+gc-sections, before: 12 minutes

13:18:39   LINK    vmlinux
13:18:39  AR      built-in.o
13:18:40  LD      vmlinux.o
13:24:44   MODPOST vmlinux.o
13:24:46  GEN     .version
13:24:46   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
13:24:46   CC      init/version.o
13:24:46   AR      init/built-in.o
13:26:34  KSYM    .tmp_kallsyms1.o
13:28:32  KSYM    .tmp_kallsyms2.o
13:28:43  LD      vmlinux
13:30:31  SORTEX  vmlinux
13:30:31  SYSMAP  System.map
13:30:33   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, thin archives+gc-sections, after: 7 minutes

12:43:15   LINK    vmlinux
12:43:15  AR      built-in.o
12:43:16  LD      vmlinux.o
12:49:19   MODPOST vmlinux.o
12:49:21  GEN     .version
12:49:21   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
12:49:22   CC      init/version.o
12:49:22   AR      init/built-in.o
12:49:33  KSYM    .tmp_kallsyms1.o
12:49:56  KSYM    .tmp_kallsyms2.o
12:50:07  LD      vmlinux
12:50:19  SORTEX  vmlinux
12:50:19  SYSMAP  System.map
12:50:21   OBJCOPY arch/arm/boot/Image
  Building modules, stage 2.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-23 12:01   ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-23 12:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> Erratum 657417 is worked around by the linker by inserting additional
> branch trampolines to avoid problematic branch target locations. This
> results in much higher linking time and presumably slower and larger
> generated code. The workaround also seems to only be required when
> linking thumb2 code, but the linker applies it for non-thumb2 code as
> well.
> 
> The workaround today is left to the linker to apply, which is overly
> conservative.
> 
> https://sourceware.org/ml/binutils/2009-05/msg00297.html
> 
> This patch adds an option which defaults to "y" in cases where we
> could possibly be running Cortex A8 and using Thumb2 instructions.
> In reality the workaround might not be required at all for the kernel
> if virtual instruction memory is linear in physical memory. However it
> is more conservative to keep the workaround, and it may be the case
> that the TLB lookup would be required in order to catch branches to
> unmapped or no-execute pages.
> 
> In an allyesconfig build, this workaround causes a large load on
> the linker's branch stub hash and slows down the final link by a
> factor of 5.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 

Thanks a lot for finding this issue. I can confirm that your patch
helps noticeably in all configurations, reducing time for a relink
from 18 minutes to 9 minutes on my machine in the best case, but
the factor 10 slowdown of the final link with your thin archives
and gc-sections patches remains.

I suspect there is still something else going on besides the 657417
slowing things down, but it's also possible that I'm doing something
wrong here.

Aside from that, I notice that for the purpose of speeding up
"allyesconfig", we don't actually need to make this user
configurable, it's sufficient to disable the workaround when
CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig
and all the defconfig files (but not randconfig) use. I also found
that using THUMB2_KERNEL itself causes a 50% slowdown. I have
patches on my "randconfig" test tree that have the side-effect
of enabling THUMB2_KERNEL for allyesconfig, which is one reason
I have been getting worse results than others.

I could also try to revive an older patch I started, to annotate
the specific CPU core on each ARMv7 platform. I think I have all the
information we need for that, and there are other advantages in
doing it: we could be more selective with all the ARMv7 errata,
and automatically determine whether some optional CPU features
(LPAE, virtualization, integer divide) are available on all
of the selected CPU cores.

	Arnd
---
Full link timing results follow

|| THUMB2, thin archive + gc-sections, before: 18 minutes

09:56:47   LINK    vmlinux
09:56:47  AR      built-in.o
09:56:49  LD      vmlinux.o
10:04:27   MODPOST vmlinux.o
10:04:29  GEN     .version
10:04:29   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:04:29   CC      init/version.o
10:04:29   AR      init/built-in.o
10:07:39  KSYM    .tmp_kallsyms1.o
10:11:05  KSYM    .tmp_kallsyms2.o
10:11:16  LD      vmlinux
10:14:30  SORTEX  vmlinux
10:14:30  SYSMAP  System.map

|| THUMB2, thin archive + gc-sections, after: 9 minutes

10:16:01   CHK     include/generated/uapi/linux/version.h
10:16:02   LINK    vmlinux
10:16:02  AR      built-in.o
10:16:03  LD      vmlinux.o
10:23:43   MODPOST vmlinux.o
10:23:46  GEN     .version
10:23:46   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:23:46   CC      init/version.o
10:23:47   AR      init/built-in.o
10:24:04  KSYM    .tmp_kallsyms1.o
10:24:32  KSYM    .tmp_kallsyms2.o
10:24:45  LD      vmlinux
10:25:00  SORTEX  vmlinux
10:25:00  SYSMAP  System.map

|| THUMB2, no thin archive + gc-sections, before: 93 seconds

10:44:35   CHK     include/generated/uapi/linux/version.h
10:44:35   LINK    vmlinux
10:44:35  LD      vmlinux.o
10:44:39   MODPOST vmlinux.o
10:44:41  GEN     .version
10:44:41   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:44:41   CC      init/version.o
10:44:41   LD      init/built-in.o
10:45:02  KSYM    .tmp_kallsyms1.o
10:45:35  KSYM    .tmp_kallsyms2.o
10:45:47  LD      vmlinux
10:46:06  SORTEX  vmlinux
10:46:06  SYSMAP  System.map
10:46:08   OBJCOPY arch/arm/boot/Image

|| THUMB2, no thin archive + gc-sections, after: 52 seconds

10:41:46   LINK    vmlinux
10:41:46  LD      vmlinux.o
10:41:49   MODPOST vmlinux.o
10:41:52  GEN     .version
10:41:52   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
10:41:52   CC      init/version.o
10:41:52   LD      init/built-in.o
10:41:58  KSYM    .tmp_kallsyms1.o
10:42:17  KSYM    .tmp_kallsyms2.o
10:42:31  LD      vmlinux
10:42:36  SORTEX  vmlinux
10:42:36  SYSMAP  System.map
10:42:38   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, no thin archives + gc-sections, before: 59 seconds

11:25:05   LINK    vmlinux
11:25:05  LD      vmlinux.o
11:25:07   MODPOST vmlinux.o
11:25:10  GEN     .version
11:25:10   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
11:25:10   CC      init/version.o
11:25:10   LD      init/built-in.o
11:25:19  KSYM    .tmp_kallsyms1.o
11:25:41  KSYM    .tmp_kallsyms2.o
11:25:53  LD      vmlinux
11:26:03  SORTEX  vmlinux
11:26:03  SYSMAP  System.map
  Building modules, stage 2.

|| THUMB2_KERNEL disabled, no thin archives + gc-sections, after: 46 seconds

11:27:36   LINK    vmlinux
11:27:36  LD      vmlinux.o
11:27:39   MODPOST vmlinux.o
11:27:41  GEN     .version
11:27:41   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
11:27:41   CC      init/version.o
11:27:41   LD      init/built-in.o
11:27:46  KSYM    .tmp_kallsyms1.o
11:28:04  KSYM    .tmp_kallsyms2.o
11:28:15  LD      vmlinux
11:28:20  SORTEX  vmlinux
11:28:20  SYSMAP  System.map
11:28:22   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, thin archives+gc-sections, before: 12 minutes

13:18:39   LINK    vmlinux
13:18:39  AR      built-in.o
13:18:40  LD      vmlinux.o
13:24:44   MODPOST vmlinux.o
13:24:46  GEN     .version
13:24:46   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
13:24:46   CC      init/version.o
13:24:46   AR      init/built-in.o
13:26:34  KSYM    .tmp_kallsyms1.o
13:28:32  KSYM    .tmp_kallsyms2.o
13:28:43  LD      vmlinux
13:30:31  SORTEX  vmlinux
13:30:31  SYSMAP  System.map
13:30:33   OBJCOPY arch/arm/boot/Image

|| THUMB2_KERNEL disabled, thin archives+gc-sections, after: 7 minutes

12:43:15   LINK    vmlinux
12:43:15  AR      built-in.o
12:43:16  LD      vmlinux.o
12:49:19   MODPOST vmlinux.o
12:49:21  GEN     .version
12:49:21   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
12:49:22   CC      init/version.o
12:49:22   AR      init/built-in.o
12:49:33  KSYM    .tmp_kallsyms1.o
12:49:56  KSYM    .tmp_kallsyms2.o
12:50:07  LD      vmlinux
12:50:19  SORTEX  vmlinux
12:50:19  SYSMAP  System.map
12:50:21   OBJCOPY arch/arm/boot/Image
  Building modules, stage 2.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-23 12:01   ` Arnd Bergmann
@ 2016-08-24  4:00     ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24  4:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Tue, 23 Aug 2016 14:01:29 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> > Erratum 657417 is worked around by the linker by inserting additional
> > branch trampolines to avoid problematic branch target locations. This
> > results in much higher linking time and presumably slower and larger
> > generated code. The workaround also seems to only be required when
> > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > well.
> > 
> > The workaround today is left to the linker to apply, which is overly
> > conservative.
> > 
> > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > 
> > This patch adds an option which defaults to "y" in cases where we
> > could possibly be running Cortex A8 and using Thumb2 instructions.
> > In reality the workaround might not be required at all for the kernel
> > if virtual instruction memory is linear in physical memory. However it
> > is more conservative to keep the workaround, and it may be the case
> > that the TLB lookup would be required in order to catch branches to
> > unmapped or no-execute pages.
> > 
> > In an allyesconfig build, this workaround causes a large load on
> > the linker's branch stub hash and slows down the final link by a
> > factor of 5.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >   
> 
> Thanks a lot for finding this issue. I can confirm that your patch
> helps noticeably in all configurations, reducing time for a relink
> from 18 minutes to 9 minutes on my machine in the best case, but
> the factor 10 slowdown of the final link with your thin archives
> and gc-sections patches remains.
> 
> I suspect there is still something else going on besides the 657417
> slowing things down, but it's also possible that I'm doing something
> wrong here.

Okay, I was only testing thin archives. gc-sections I didn't look at.
With thin archives, one final arm allyesconfig link with this patch is
not showing a regression. gc-sections must be causing something else
ARM specific, because powerpc seems to link fast with gc-sections.

Can you send your latest ARM patch to enable this and I'll have a look
at it?


> Aside from that, I notice that for the purpose of speeding up
> "allyesconfig", we don't actually need to make this user
> configurable, it's sufficient to disable the workaround when
> CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig
> and all the defconfig files (but not randconfig) use. I also found
> that using THUMB2_KERNEL itself causes a 50% slowdown. I have
> patches on my "randconfig" test tree that have the side-effect
> of enabling THUMB2_KERNEL for allyesconfig, which is one reason
> I have been getting worse results than others.
> 
> I could also try to revive an older patch I started, to annotate
> the specific CPU core on each ARMv7 platform. I think I have all the
> information we need for that, and there are other advantages in
> doing it: we could be more selective with all the ARMv7 errata,
> and automatically determine whether some optional CPU features
> (LPAE, virtualization, integer divide) are available on all
> of the selected CPU cores.

Yeah I was just trying to follow existing pattern and be conservative
with the workaround. From my brief look, it does seem like there is
room to optimize build options by having more fine grained selection
of target CPU/platform.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24  4:00     ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24  4:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Aug 2016 14:01:29 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> > Erratum 657417 is worked around by the linker by inserting additional
> > branch trampolines to avoid problematic branch target locations. This
> > results in much higher linking time and presumably slower and larger
> > generated code. The workaround also seems to only be required when
> > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > well.
> > 
> > The workaround today is left to the linker to apply, which is overly
> > conservative.
> > 
> > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > 
> > This patch adds an option which defaults to "y" in cases where we
> > could possibly be running Cortex A8 and using Thumb2 instructions.
> > In reality the workaround might not be required at all for the kernel
> > if virtual instruction memory is linear in physical memory. However it
> > is more conservative to keep the workaround, and it may be the case
> > that the TLB lookup would be required in order to catch branches to
> > unmapped or no-execute pages.
> > 
> > In an allyesconfig build, this workaround causes a large load on
> > the linker's branch stub hash and slows down the final link by a
> > factor of 5.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >   
> 
> Thanks a lot for finding this issue. I can confirm that your patch
> helps noticeably in all configurations, reducing time for a relink
> from 18 minutes to 9 minutes on my machine in the best case, but
> the factor 10 slowdown of the final link with your thin archives
> and gc-sections patches remains.
> 
> I suspect there is still something else going on besides the 657417
> slowing things down, but it's also possible that I'm doing something
> wrong here.

Okay, I was only testing thin archives. gc-sections I didn't look at.
With thin archives, one final arm allyesconfig link with this patch is
not showing a regression. gc-sections must be causing something else
ARM specific, because powerpc seems to link fast with gc-sections.

Can you send your latest ARM patch to enable this and I'll have a look
at it?


> Aside from that, I notice that for the purpose of speeding up
> "allyesconfig", we don't actually need to make this user
> configurable, it's sufficient to disable the workaround when
> CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig
> and all the defconfig files (but not randconfig) use. I also found
> that using THUMB2_KERNEL itself causes a 50% slowdown. I have
> patches on my "randconfig" test tree that have the side-effect
> of enabling THUMB2_KERNEL for allyesconfig, which is one reason
> I have been getting worse results than others.
> 
> I could also try to revive an older patch I started, to annotate
> the specific CPU core on each ARMv7 platform. I think I have all the
> information we need for that, and there are other advantages in
> doing it: we could be more selective with all the ARMv7 errata,
> and automatically determine whether some optional CPU features
> (LPAE, virtualization, integer divide) are available on all
> of the selected CPU cores.

Yeah I was just trying to follow existing pattern and be conservative
with the workaround. From my brief look, it does seem like there is
room to optimize build options by having more fine grained selection
of target CPU/platform.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24  4:00     ` Nicholas Piggin
  (?)
@ 2016-08-24  7:41       ` Arnd Bergmann
  -1 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-24  7:41 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:
> On Tue, 23 Aug 2016 14:01:29 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> > > Erratum 657417 is worked around by the linker by inserting additional
> > > branch trampolines to avoid problematic branch target locations. This
> > > results in much higher linking time and presumably slower and larger
> > > generated code. The workaround also seems to only be required when
> > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > well.
> > > 
> > > The workaround today is left to the linker to apply, which is overly
> > > conservative.
> > > 
> > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > 
> > > This patch adds an option which defaults to "y" in cases where we
> > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > In reality the workaround might not be required at all for the kernel
> > > if virtual instruction memory is linear in physical memory. However it
> > > is more conservative to keep the workaround, and it may be the case
> > > that the TLB lookup would be required in order to catch branches to
> > > unmapped or no-execute pages.
> > > 
> > > In an allyesconfig build, this workaround causes a large load on
> > > the linker's branch stub hash and slows down the final link by a
> > > factor of 5.
> > > 
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > >   
> > 
> > Thanks a lot for finding this issue. I can confirm that your patch
> > helps noticeably in all configurations, reducing time for a relink
> > from 18 minutes to 9 minutes on my machine in the best case, but
> > the factor 10 slowdown of the final link with your thin archives
> > and gc-sections patches remains.
> > 
> > I suspect there is still something else going on besides the 657417
> > slowing things down, but it's also possible that I'm doing something
> > wrong here.
> 
> Okay, I was only testing thin archives. gc-sections I didn't look at.
> With thin archives, one final arm allyesconfig link with this patch is
> not showing a regression. gc-sections must be causing something else
> ARM specific, because powerpc seems to link fast with gc-sections.

Ok, I see. For completeness, here are my results with thin archives and
without gc-sections on ARM:

|| no THUMB2, thin archives, no gc-sections, before: 144 seconds

09:29:51   LINK    vmlinux
09:29:51  AR      built-in.o
09:29:52  LD      vmlinux.o
09:30:12   MODPOST vmlinux.o
09:30:14  GEN     .version
09:30:14   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:30:14   CC      init/version.o
09:30:15   AR      init/built-in.o
09:30:43  KSYM    .tmp_kallsyms1.o
09:31:28  KSYM    .tmp_kallsyms2.o
09:31:40  LD      vmlinux
09:32:13  SORTEX  vmlinux
09:32:13  SYSMAP  System.map
09:32:15   OBJCOPY arch/arm/boot/Image

|| no THUMB2, thin archives, no gc-sections, after: 70 seconds

09:33:54   LINK    vmlinux
09:33:54  AR      built-in.o
09:33:55  LD      vmlinux.o
09:34:13   MODPOST vmlinux.o
09:34:15  GEN     .version
09:34:16   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:34:16   CC      init/version.o
09:34:16   AR      init/built-in.o
09:34:24  KSYM    .tmp_kallsyms1.o
09:34:43  KSYM    .tmp_kallsyms2.o
09:34:55  LD      vmlinux
09:35:03  SORTEX  vmlinux
09:35:03  SYSMAP  System.map
09:35:04   OBJCOPY arch/arm/boot/Image

The final 'LD' step is much faster here as you also found, and now
the time for the complete link is mainly the initial 'LD vmlinux.o'
step, which did not get faster with your patch.

> Can you send your latest ARM patch to enable this and I'll have a look
> at it?

See below. I have not updated the patch description yet, but included
the changes that Nico suggested. The test above used the same patch
but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.

	Arnd

commit 0cfcd63ca78c8f754e57bc8b0c9f6a18dfd2caa4
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Sat Aug 6 22:46:26 2016 +0200

    [EXPERIMENTAL] enable thin archives and --gc-sections on ARM
    
    This goes on top of Nick's latest version of "[PATCH 0/6 v2] kbuild changes,
    thin archives, --gc-sections" and enables both features on ARM.
    
    It's a bit half-baked, these are known problems:
    
    - as big-endian support is still broken, I disable it in Kconfig
      so an allyesconfig build ends up as little-endian
    
    - I've thrown in a change to include/asm-generic/vmlinux.lds.h
      but don't know whether this is the right way or not. We have
      to keep .text.fixup linked together with .text, but I separate
      out .text.unlikely and .text.hot again. This has not caused
      any link failures for me (yet).
    
    - I mark a ton of sections as KEEP() in vmlinux.lds.S. Some of
      them might not actually be needed, and I have not spent much
      time checking what they actually are. However, I did build
      a few hundred randconfigs without new issues.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b62ae32f8a1e..9bf37a6e7384 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -83,6 +83,7 @@ config ARM
 	select HAVE_UID16
 	select HAVE_VIRT_CPU_ACCOUNTING_GEN
 	select IRQ_FORCED_THREADING
+	select LD_DEAD_CODE_DATA_ELIMINATION
 	select MODULES_USE_ELF_REL
 	select NO_BOOTMEM
 	select OF_EARLY_FLATTREE if OF
@@ -92,6 +93,7 @@ config ARM
 	select PERF_USE_VMALLOC
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
+	select THIN_ARCHIVES
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
 	help
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index ad325a8c7e1e..b7f2a41fd940 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -13,6 +13,9 @@ endif
 
 CFLAGS_REMOVE_return_address.o = -pg
 
+ccflags-y              += -fno-function-sections -fno-data-sections
+subdir-ccflags-y       += -fno-function-sections -fno-data-sections
+
 # Object file lists.
 
 obj-y		:= elf.o entry-common.o irq.o opcodes.o \
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index 56c8bdf776bd..4b515ae498e2 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -12,17 +12,17 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define IDMAP_TEXT							\
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
-	*(.idmap.text)							\
+	KEEP(*(.idmap.text))						\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
 	. = ALIGN(PAGE_SIZE);						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
-	*(.hyp.idmap.text)						\
+	KEEP(*(.hyp.idmap.text))					\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -114,7 +114,7 @@ SECTIONS
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
 #ifdef CONFIG_MMU
-		*(__ex_table)
+		KEEP(*(__ex_table))
 #endif
 		__stop___ex_table = .;
 	}
@@ -126,12 +126,12 @@ SECTIONS
 	. = ALIGN(8);
 	.ARM.unwind_idx : {
 		__start_unwind_idx = .;
-		*(.ARM.exidx*)
+		KEEP(*(.ARM.exidx*))
 		__stop_unwind_idx = .;
 	}
 	.ARM.unwind_tab : {
 		__start_unwind_tab = .;
-		*(.ARM.extab*)
+		KEEP(*(.ARM.extab*))
 		__stop_unwind_tab = .;
 	}
 #endif
@@ -146,7 +146,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -169,24 +169,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 7396a5f00c5f..abb59e4c12db 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -17,7 +17,7 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define HYPERVISOR_TEXT							\
@@ -169,7 +169,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -192,24 +192,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 6a09cc204b07..7117b8e99de8 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -717,6 +717,7 @@ config SWP_EMULATE
 config CPU_BIG_ENDIAN
 	bool "Build big-endian kernel"
 	depends on ARCH_SUPPORTS_BIG_ENDIAN
+	depends on !THIN_ARCHIVES
 	help
 	  Say Y if you plan on running a kernel in big-endian mode.
 	  Note that your board must be properly built and your board
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 9136c3afd3c6..e01f0b00a678 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -433,7 +433,9 @@
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot .text .text.fixup .text.unlikely .text.*)	\
+		*(.text.hot .text.hot.*)				\
+		*(.text.unlikely .text.unlikely.*)			\
+		*(.text .text.*)					\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-24  7:41       ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-24  7:41 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Alan Modra,
	linux-kbuild, Russell King, Julian Brown, Michal Marek,
	Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:
> On Tue, 23 Aug 2016 14:01:29 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> > > Erratum 657417 is worked around by the linker by inserting additional
> > > branch trampolines to avoid problematic branch target locations. This
> > > results in much higher linking time and presumably slower and larger
> > > generated code. The workaround also seems to only be required when
> > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > well.
> > > 
> > > The workaround today is left to the linker to apply, which is overly
> > > conservative.
> > > 
> > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > 
> > > This patch adds an option which defaults to "y" in cases where we
> > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > In reality the workaround might not be required at all for the kernel
> > > if virtual instruction memory is linear in physical memory. However it
> > > is more conservative to keep the workaround, and it may be the case
> > > that the TLB lookup would be required in order to catch branches to
> > > unmapped or no-execute pages.
> > > 
> > > In an allyesconfig build, this workaround causes a large load on
> > > the linker's branch stub hash and slows down the final link by a
> > > factor of 5.
> > > 
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > >   
> > 
> > Thanks a lot for finding this issue. I can confirm that your patch
> > helps noticeably in all configurations, reducing time for a relink
> > from 18 minutes to 9 minutes on my machine in the best case, but
> > the factor 10 slowdown of the final link with your thin archives
> > and gc-sections patches remains.
> > 
> > I suspect there is still something else going on besides the 657417
> > slowing things down, but it's also possible that I'm doing something
> > wrong here.
> 
> Okay, I was only testing thin archives. gc-sections I didn't look at.
> With thin archives, one final arm allyesconfig link with this patch is
> not showing a regression. gc-sections must be causing something else
> ARM specific, because powerpc seems to link fast with gc-sections.

Ok, I see. For completeness, here are my results with thin archives and
without gc-sections on ARM:

|| no THUMB2, thin archives, no gc-sections, before: 144 seconds

09:29:51   LINK    vmlinux
09:29:51  AR      built-in.o
09:29:52  LD      vmlinux.o
09:30:12   MODPOST vmlinux.o
09:30:14  GEN     .version
09:30:14   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:30:14   CC      init/version.o
09:30:15   AR      init/built-in.o
09:30:43  KSYM    .tmp_kallsyms1.o
09:31:28  KSYM    .tmp_kallsyms2.o
09:31:40  LD      vmlinux
09:32:13  SORTEX  vmlinux
09:32:13  SYSMAP  System.map
09:32:15   OBJCOPY arch/arm/boot/Image

|| no THUMB2, thin archives, no gc-sections, after: 70 seconds

09:33:54   LINK    vmlinux
09:33:54  AR      built-in.o
09:33:55  LD      vmlinux.o
09:34:13   MODPOST vmlinux.o
09:34:15  GEN     .version
09:34:16   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:34:16   CC      init/version.o
09:34:16   AR      init/built-in.o
09:34:24  KSYM    .tmp_kallsyms1.o
09:34:43  KSYM    .tmp_kallsyms2.o
09:34:55  LD      vmlinux
09:35:03  SORTEX  vmlinux
09:35:03  SYSMAP  System.map
09:35:04   OBJCOPY arch/arm/boot/Image

The final 'LD' step is much faster here as you also found, and now
the time for the complete link is mainly the initial 'LD vmlinux.o'
step, which did not get faster with your patch.

> Can you send your latest ARM patch to enable this and I'll have a look
> at it?

See below. I have not updated the patch description yet, but included
the changes that Nico suggested. The test above used the same patch
but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.

	Arnd

commit 0cfcd63ca78c8f754e57bc8b0c9f6a18dfd2caa4
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Sat Aug 6 22:46:26 2016 +0200

    [EXPERIMENTAL] enable thin archives and --gc-sections on ARM
    
    This goes on top of Nick's latest version of "[PATCH 0/6 v2] kbuild changes,
    thin archives, --gc-sections" and enables both features on ARM.
    
    It's a bit half-baked, these are known problems:
    
    - as big-endian support is still broken, I disable it in Kconfig
      so an allyesconfig build ends up as little-endian
    
    - I've thrown in a change to include/asm-generic/vmlinux.lds.h
      but don't know whether this is the right way or not. We have
      to keep .text.fixup linked together with .text, but I separate
      out .text.unlikely and .text.hot again. This has not caused
      any link failures for me (yet).
    
    - I mark a ton of sections as KEEP() in vmlinux.lds.S. Some of
      them might not actually be needed, and I have not spent much
      time checking what they actually are. However, I did build
      a few hundred randconfigs without new issues.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b62ae32f8a1e..9bf37a6e7384 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -83,6 +83,7 @@ config ARM
 	select HAVE_UID16
 	select HAVE_VIRT_CPU_ACCOUNTING_GEN
 	select IRQ_FORCED_THREADING
+	select LD_DEAD_CODE_DATA_ELIMINATION
 	select MODULES_USE_ELF_REL
 	select NO_BOOTMEM
 	select OF_EARLY_FLATTREE if OF
@@ -92,6 +93,7 @@ config ARM
 	select PERF_USE_VMALLOC
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
+	select THIN_ARCHIVES
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
 	help
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index ad325a8c7e1e..b7f2a41fd940 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -13,6 +13,9 @@ endif
 
 CFLAGS_REMOVE_return_address.o = -pg
 
+ccflags-y              += -fno-function-sections -fno-data-sections
+subdir-ccflags-y       += -fno-function-sections -fno-data-sections
+
 # Object file lists.
 
 obj-y		:= elf.o entry-common.o irq.o opcodes.o \
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index 56c8bdf776bd..4b515ae498e2 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -12,17 +12,17 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define IDMAP_TEXT							\
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
-	*(.idmap.text)							\
+	KEEP(*(.idmap.text))						\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
 	. = ALIGN(PAGE_SIZE);						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
-	*(.hyp.idmap.text)						\
+	KEEP(*(.hyp.idmap.text))					\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -114,7 +114,7 @@ SECTIONS
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
 #ifdef CONFIG_MMU
-		*(__ex_table)
+		KEEP(*(__ex_table))
 #endif
 		__stop___ex_table = .;
 	}
@@ -126,12 +126,12 @@ SECTIONS
 	. = ALIGN(8);
 	.ARM.unwind_idx : {
 		__start_unwind_idx = .;
-		*(.ARM.exidx*)
+		KEEP(*(.ARM.exidx*))
 		__stop_unwind_idx = .;
 	}
 	.ARM.unwind_tab : {
 		__start_unwind_tab = .;
-		*(.ARM.extab*)
+		KEEP(*(.ARM.extab*))
 		__stop_unwind_tab = .;
 	}
 #endif
@@ -146,7 +146,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -169,24 +169,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 7396a5f00c5f..abb59e4c12db 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -17,7 +17,7 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define HYPERVISOR_TEXT							\
@@ -169,7 +169,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -192,24 +192,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 6a09cc204b07..7117b8e99de8 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -717,6 +717,7 @@ config SWP_EMULATE
 config CPU_BIG_ENDIAN
 	bool "Build big-endian kernel"
 	depends on ARCH_SUPPORTS_BIG_ENDIAN
+	depends on !THIN_ARCHIVES
 	help
 	  Say Y if you plan on running a kernel in big-endian mode.
 	  Note that your board must be properly built and your board
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 9136c3afd3c6..e01f0b00a678 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -433,7 +433,9 @@
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot .text .text.fixup .text.unlikely .text.*)	\
+		*(.text.hot .text.hot.*)				\
+		*(.text.unlikely .text.unlikely.*)			\
+		*(.text .text.*)					\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24  7:41       ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-24  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:
> On Tue, 23 Aug 2016 14:01:29 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:
> > > Erratum 657417 is worked around by the linker by inserting additional
> > > branch trampolines to avoid problematic branch target locations. This
> > > results in much higher linking time and presumably slower and larger
> > > generated code. The workaround also seems to only be required when
> > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > well.
> > > 
> > > The workaround today is left to the linker to apply, which is overly
> > > conservative.
> > > 
> > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > 
> > > This patch adds an option which defaults to "y" in cases where we
> > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > In reality the workaround might not be required at all for the kernel
> > > if virtual instruction memory is linear in physical memory. However it
> > > is more conservative to keep the workaround, and it may be the case
> > > that the TLB lookup would be required in order to catch branches to
> > > unmapped or no-execute pages.
> > > 
> > > In an allyesconfig build, this workaround causes a large load on
> > > the linker's branch stub hash and slows down the final link by a
> > > factor of 5.
> > > 
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > >   
> > 
> > Thanks a lot for finding this issue. I can confirm that your patch
> > helps noticeably in all configurations, reducing time for a relink
> > from 18 minutes to 9 minutes on my machine in the best case, but
> > the factor 10 slowdown of the final link with your thin archives
> > and gc-sections patches remains.
> > 
> > I suspect there is still something else going on besides the 657417
> > slowing things down, but it's also possible that I'm doing something
> > wrong here.
> 
> Okay, I was only testing thin archives. gc-sections I didn't look at.
> With thin archives, one final arm allyesconfig link with this patch is
> not showing a regression. gc-sections must be causing something else
> ARM specific, because powerpc seems to link fast with gc-sections.

Ok, I see. For completeness, here are my results with thin archives and
without gc-sections on ARM:

|| no THUMB2, thin archives, no gc-sections, before: 144 seconds

09:29:51   LINK    vmlinux
09:29:51  AR      built-in.o
09:29:52  LD      vmlinux.o
09:30:12   MODPOST vmlinux.o
09:30:14  GEN     .version
09:30:14   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:30:14   CC      init/version.o
09:30:15   AR      init/built-in.o
09:30:43  KSYM    .tmp_kallsyms1.o
09:31:28  KSYM    .tmp_kallsyms2.o
09:31:40  LD      vmlinux
09:32:13  SORTEX  vmlinux
09:32:13  SYSMAP  System.map
09:32:15   OBJCOPY arch/arm/boot/Image

|| no THUMB2, thin archives, no gc-sections, after: 70 seconds

09:33:54   LINK    vmlinux
09:33:54  AR      built-in.o
09:33:55  LD      vmlinux.o
09:34:13   MODPOST vmlinux.o
09:34:15  GEN     .version
09:34:16   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
09:34:16   CC      init/version.o
09:34:16   AR      init/built-in.o
09:34:24  KSYM    .tmp_kallsyms1.o
09:34:43  KSYM    .tmp_kallsyms2.o
09:34:55  LD      vmlinux
09:35:03  SORTEX  vmlinux
09:35:03  SYSMAP  System.map
09:35:04   OBJCOPY arch/arm/boot/Image

The final 'LD' step is much faster here as you also found, and now
the time for the complete link is mainly the initial 'LD vmlinux.o'
step, which did not get faster with your patch.

> Can you send your latest ARM patch to enable this and I'll have a look
> at it?

See below. I have not updated the patch description yet, but included
the changes that Nico suggested. The test above used the same patch
but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.

	Arnd

commit 0cfcd63ca78c8f754e57bc8b0c9f6a18dfd2caa4
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Sat Aug 6 22:46:26 2016 +0200

    [EXPERIMENTAL] enable thin archives and --gc-sections on ARM
    
    This goes on top of Nick's latest version of "[PATCH 0/6 v2] kbuild changes,
    thin archives, --gc-sections" and enables both features on ARM.
    
    It's a bit half-baked, these are known problems:
    
    - as big-endian support is still broken, I disable it in Kconfig
      so an allyesconfig build ends up as little-endian
    
    - I've thrown in a change to include/asm-generic/vmlinux.lds.h
      but don't know whether this is the right way or not. We have
      to keep .text.fixup linked together with .text, but I separate
      out .text.unlikely and .text.hot again. This has not caused
      any link failures for me (yet).
    
    - I mark a ton of sections as KEEP() in vmlinux.lds.S. Some of
      them might not actually be needed, and I have not spent much
      time checking what they actually are. However, I did build
      a few hundred randconfigs without new issues.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b62ae32f8a1e..9bf37a6e7384 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -83,6 +83,7 @@ config ARM
 	select HAVE_UID16
 	select HAVE_VIRT_CPU_ACCOUNTING_GEN
 	select IRQ_FORCED_THREADING
+	select LD_DEAD_CODE_DATA_ELIMINATION
 	select MODULES_USE_ELF_REL
 	select NO_BOOTMEM
 	select OF_EARLY_FLATTREE if OF
@@ -92,6 +93,7 @@ config ARM
 	select PERF_USE_VMALLOC
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
+	select THIN_ARCHIVES
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
 	help
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index ad325a8c7e1e..b7f2a41fd940 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -13,6 +13,9 @@ endif
 
 CFLAGS_REMOVE_return_address.o = -pg
 
+ccflags-y              += -fno-function-sections -fno-data-sections
+subdir-ccflags-y       += -fno-function-sections -fno-data-sections
+
 # Object file lists.
 
 obj-y		:= elf.o entry-common.o irq.o opcodes.o \
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index 56c8bdf776bd..4b515ae498e2 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -12,17 +12,17 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define IDMAP_TEXT							\
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
-	*(.idmap.text)							\
+	KEEP(*(.idmap.text))						\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
 	. = ALIGN(PAGE_SIZE);						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
-	*(.hyp.idmap.text)						\
+	KEEP(*(.hyp.idmap.text))					\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -114,7 +114,7 @@ SECTIONS
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
 #ifdef CONFIG_MMU
-		*(__ex_table)
+		KEEP(*(__ex_table))
 #endif
 		__stop___ex_table = .;
 	}
@@ -126,12 +126,12 @@ SECTIONS
 	. = ALIGN(8);
 	.ARM.unwind_idx : {
 		__start_unwind_idx = .;
-		*(.ARM.exidx*)
+		KEEP(*(.ARM.exidx*))
 		__stop_unwind_idx = .;
 	}
 	.ARM.unwind_tab : {
 		__start_unwind_tab = .;
-		*(.ARM.extab*)
+		KEEP(*(.ARM.extab*))
 		__stop_unwind_tab = .;
 	}
 #endif
@@ -146,7 +146,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -169,24 +169,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 7396a5f00c5f..abb59e4c12db 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -17,7 +17,7 @@
 #define PROC_INFO							\
 	. = ALIGN(4);							\
 	VMLINUX_SYMBOL(__proc_info_begin) = .;				\
-	*(.proc.info.init)						\
+	KEEP(*(.proc.info.init))					\
 	VMLINUX_SYMBOL(__proc_info_end) = .;
 
 #define HYPERVISOR_TEXT							\
@@ -169,7 +169,7 @@ SECTIONS
 	 */
 	__vectors_start = .;
 	.vectors 0xffff0000 : AT(__vectors_start) {
-		*(.vectors)
+		KEEP(*(.vectors))
 	}
 	. = __vectors_start + SIZEOF(.vectors);
 	__vectors_end = .;
@@ -192,24 +192,24 @@ SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
 		__tagtable_begin = .;
-		*(.taglist.init)
+		KEEP(*(.taglist.init))
 		__tagtable_end = .;
 	}
 #ifdef CONFIG_SMP_ON_UP
 	.init.smpalt : {
 		__smpalt_begin = .;
-		*(.alt.smp.init)
+		KEEP(*(.alt.smp.init))
 		__smpalt_end = .;
 	}
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 	.init.data : {
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 6a09cc204b07..7117b8e99de8 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -717,6 +717,7 @@ config SWP_EMULATE
 config CPU_BIG_ENDIAN
 	bool "Build big-endian kernel"
 	depends on ARCH_SUPPORTS_BIG_ENDIAN
+	depends on !THIN_ARCHIVES
 	help
 	  Say Y if you plan on running a kernel in big-endian mode.
 	  Note that your board must be properly built and your board
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 9136c3afd3c6..e01f0b00a678 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -433,7 +433,9 @@
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot .text .text.fixup .text.unlikely .text.*)	\
+		*(.text.hot .text.hot.*)				\
+		*(.text.unlikely .text.unlikely.*)			\
+		*(.text .text.*)					\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24  7:41       ` Arnd Bergmann
@ 2016-08-24  9:05         ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24  9:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wed, 24 Aug 2016 09:41:39 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:
> > On Tue, 23 Aug 2016 14:01:29 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >   
> > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:  
> > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > branch trampolines to avoid problematic branch target locations. This
> > > > results in much higher linking time and presumably slower and larger
> > > > generated code. The workaround also seems to only be required when
> > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > well.
> > > > 
> > > > The workaround today is left to the linker to apply, which is overly
> > > > conservative.
> > > > 
> > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > 
> > > > This patch adds an option which defaults to "y" in cases where we
> > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > In reality the workaround might not be required at all for the kernel
> > > > if virtual instruction memory is linear in physical memory. However it
> > > > is more conservative to keep the workaround, and it may be the case
> > > > that the TLB lookup would be required in order to catch branches to
> > > > unmapped or no-execute pages.
> > > > 
> > > > In an allyesconfig build, this workaround causes a large load on
> > > > the linker's branch stub hash and slows down the final link by a
> > > > factor of 5.
> > > > 
> > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > >     
> > > 
> > > Thanks a lot for finding this issue. I can confirm that your patch
> > > helps noticeably in all configurations, reducing time for a relink
> > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > the factor 10 slowdown of the final link with your thin archives
> > > and gc-sections patches remains.
> > > 
> > > I suspect there is still something else going on besides the 657417
> > > slowing things down, but it's also possible that I'm doing something
> > > wrong here.  
> > 
> > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > With thin archives, one final arm allyesconfig link with this patch is
> > not showing a regression. gc-sections must be causing something else
> > ARM specific, because powerpc seems to link fast with gc-sections.  
> 
> Ok, I see. For completeness, here are my results with thin archives and
> without gc-sections on ARM:

This is about what I saw.

> 
> || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> 
> 09:29:51   LINK    vmlinux
> 09:29:51  AR      built-in.o
> 09:29:52  LD      vmlinux.o
> 09:30:12   MODPOST vmlinux.o
> 09:30:14  GEN     .version
> 09:30:14   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 09:30:14   CC      init/version.o
> 09:30:15   AR      init/built-in.o
> 09:30:43  KSYM    .tmp_kallsyms1.o
> 09:31:28  KSYM    .tmp_kallsyms2.o
> 09:31:40  LD      vmlinux
> 09:32:13  SORTEX  vmlinux
> 09:32:13  SYSMAP  System.map
> 09:32:15   OBJCOPY arch/arm/boot/Image
> 
> || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> 
> 09:33:54   LINK    vmlinux
> 09:33:54  AR      built-in.o
> 09:33:55  LD      vmlinux.o
> 09:34:13   MODPOST vmlinux.o
> 09:34:15  GEN     .version
> 09:34:16   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 09:34:16   CC      init/version.o
> 09:34:16   AR      init/built-in.o
> 09:34:24  KSYM    .tmp_kallsyms1.o
> 09:34:43  KSYM    .tmp_kallsyms2.o
> 09:34:55  LD      vmlinux
> 09:35:03  SORTEX  vmlinux
> 09:35:03  SYSMAP  System.map
> 09:35:04   OBJCOPY arch/arm/boot/Image
> 
> The final 'LD' step is much faster here as you also found, and now
> the time for the complete link is mainly the initial 'LD vmlinux.o'
> step, which did not get faster with your patch.

The info here isn't very good because KSYM is printed after the link
but before the kallsyms generation. We can kind of see what's happening
if we take the time between the KSYM and the LD vmlinux as the time for
kallsyms: 
                           before    after
KSYM                      ~12s      ~12s
LD      vmlinux.o          20s       18s
LD      .tmp_kallsyms1.o   28s        8s
LD      .tmp_kallsyms2.o   33s        7s
LD      vmlinux            33s        8s

Probably the cortex a8 workaround does not get applied to vmlinux.o
link (due to being incremental), so we don't see any speedup with the
patch. It takes longer overall I guess because it keeps a lot of
symbols in the output file (due to incremental).


> > Can you send your latest ARM patch to enable this and I'll have a look
> > at it?  
> 
> See below. I have not updated the patch description yet, but included
> the changes that Nico suggested. The test above used the same patch
> but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.

Thanks, I'll take a look.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24  9:05         ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24  9:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Aug 2016 09:41:39 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:
> > On Tue, 23 Aug 2016 14:01:29 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >   
> > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:  
> > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > branch trampolines to avoid problematic branch target locations. This
> > > > results in much higher linking time and presumably slower and larger
> > > > generated code. The workaround also seems to only be required when
> > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > well.
> > > > 
> > > > The workaround today is left to the linker to apply, which is overly
> > > > conservative.
> > > > 
> > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > 
> > > > This patch adds an option which defaults to "y" in cases where we
> > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > In reality the workaround might not be required at all for the kernel
> > > > if virtual instruction memory is linear in physical memory. However it
> > > > is more conservative to keep the workaround, and it may be the case
> > > > that the TLB lookup would be required in order to catch branches to
> > > > unmapped or no-execute pages.
> > > > 
> > > > In an allyesconfig build, this workaround causes a large load on
> > > > the linker's branch stub hash and slows down the final link by a
> > > > factor of 5.
> > > > 
> > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > >     
> > > 
> > > Thanks a lot for finding this issue. I can confirm that your patch
> > > helps noticeably in all configurations, reducing time for a relink
> > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > the factor 10 slowdown of the final link with your thin archives
> > > and gc-sections patches remains.
> > > 
> > > I suspect there is still something else going on besides the 657417
> > > slowing things down, but it's also possible that I'm doing something
> > > wrong here.  
> > 
> > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > With thin archives, one final arm allyesconfig link with this patch is
> > not showing a regression. gc-sections must be causing something else
> > ARM specific, because powerpc seems to link fast with gc-sections.  
> 
> Ok, I see. For completeness, here are my results with thin archives and
> without gc-sections on ARM:

This is about what I saw.

> 
> || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> 
> 09:29:51   LINK    vmlinux
> 09:29:51  AR      built-in.o
> 09:29:52  LD      vmlinux.o
> 09:30:12   MODPOST vmlinux.o
> 09:30:14  GEN     .version
> 09:30:14   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 09:30:14   CC      init/version.o
> 09:30:15   AR      init/built-in.o
> 09:30:43  KSYM    .tmp_kallsyms1.o
> 09:31:28  KSYM    .tmp_kallsyms2.o
> 09:31:40  LD      vmlinux
> 09:32:13  SORTEX  vmlinux
> 09:32:13  SYSMAP  System.map
> 09:32:15   OBJCOPY arch/arm/boot/Image
> 
> || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> 
> 09:33:54   LINK    vmlinux
> 09:33:54  AR      built-in.o
> 09:33:55  LD      vmlinux.o
> 09:34:13   MODPOST vmlinux.o
> 09:34:15  GEN     .version
> 09:34:16   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 09:34:16   CC      init/version.o
> 09:34:16   AR      init/built-in.o
> 09:34:24  KSYM    .tmp_kallsyms1.o
> 09:34:43  KSYM    .tmp_kallsyms2.o
> 09:34:55  LD      vmlinux
> 09:35:03  SORTEX  vmlinux
> 09:35:03  SYSMAP  System.map
> 09:35:04   OBJCOPY arch/arm/boot/Image
> 
> The final 'LD' step is much faster here as you also found, and now
> the time for the complete link is mainly the initial 'LD vmlinux.o'
> step, which did not get faster with your patch.

The info here isn't very good because KSYM is printed after the link
but before the kallsyms generation. We can kind of see what's happening
if we take the time between the KSYM and the LD vmlinux as the time for
kallsyms: 
                           before    after
KSYM                      ~12s      ~12s
LD      vmlinux.o          20s       18s
LD      .tmp_kallsyms1.o   28s        8s
LD      .tmp_kallsyms2.o   33s        7s
LD      vmlinux            33s        8s

Probably the cortex a8 workaround does not get applied to vmlinux.o
link (due to being incremental), so we don't see any speedup with the
patch. It takes longer overall I guess because it keeps a lot of
symbols in the output file (due to incremental).


> > Can you send your latest ARM patch to enable this and I'll have a look
> > at it?  
> 
> See below. I have not updated the patch description yet, but included
> the changes that Nico suggested. The test above used the same patch
> but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.

Thanks, I'll take a look.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24  9:05         ` Nicholas Piggin
  (?)
@ 2016-08-24 10:56           ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24 10:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wed, 24 Aug 2016 19:05:30 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Wed, 24 Aug 2016 09:41:39 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:  
> > > On Tue, 23 Aug 2016 14:01:29 +0200
> > > Arnd Bergmann <arnd@arndb.de> wrote:
> > >     
> > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:    
> > > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > > branch trampolines to avoid problematic branch target locations. This
> > > > > results in much higher linking time and presumably slower and larger
> > > > > generated code. The workaround also seems to only be required when
> > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > > well.
> > > > > 
> > > > > The workaround today is left to the linker to apply, which is overly
> > > > > conservative.
> > > > > 
> > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > > 
> > > > > This patch adds an option which defaults to "y" in cases where we
> > > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > > In reality the workaround might not be required at all for the kernel
> > > > > if virtual instruction memory is linear in physical memory. However it
> > > > > is more conservative to keep the workaround, and it may be the case
> > > > > that the TLB lookup would be required in order to catch branches to
> > > > > unmapped or no-execute pages.
> > > > > 
> > > > > In an allyesconfig build, this workaround causes a large load on
> > > > > the linker's branch stub hash and slows down the final link by a
> > > > > factor of 5.
> > > > > 
> > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > >       
> > > > 
> > > > Thanks a lot for finding this issue. I can confirm that your patch
> > > > helps noticeably in all configurations, reducing time for a relink
> > > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > > the factor 10 slowdown of the final link with your thin archives
> > > > and gc-sections patches remains.
> > > > 
> > > > I suspect there is still something else going on besides the 657417
> > > > slowing things down, but it's also possible that I'm doing something
> > > > wrong here.    
> > > 
> > > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > > With thin archives, one final arm allyesconfig link with this patch is
> > > not showing a regression. gc-sections must be causing something else
> > > ARM specific, because powerpc seems to link fast with gc-sections.    
> > 
> > Ok, I see. For completeness, here are my results with thin archives and
> > without gc-sections on ARM:  
> 
> This is about what I saw.
> 
> > 
> > || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> > 
> > 09:29:51   LINK    vmlinux
> > 09:29:51  AR      built-in.o
> > 09:29:52  LD      vmlinux.o
> > 09:30:12   MODPOST vmlinux.o
> > 09:30:14  GEN     .version
> > 09:30:14   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:30:14   CC      init/version.o
> > 09:30:15   AR      init/built-in.o
> > 09:30:43  KSYM    .tmp_kallsyms1.o
> > 09:31:28  KSYM    .tmp_kallsyms2.o
> > 09:31:40  LD      vmlinux
> > 09:32:13  SORTEX  vmlinux
> > 09:32:13  SYSMAP  System.map
> > 09:32:15   OBJCOPY arch/arm/boot/Image
> > 
> > || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> > 
> > 09:33:54   LINK    vmlinux
> > 09:33:54  AR      built-in.o
> > 09:33:55  LD      vmlinux.o
> > 09:34:13   MODPOST vmlinux.o
> > 09:34:15  GEN     .version
> > 09:34:16   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:34:16   CC      init/version.o
> > 09:34:16   AR      init/built-in.o
> > 09:34:24  KSYM    .tmp_kallsyms1.o
> > 09:34:43  KSYM    .tmp_kallsyms2.o
> > 09:34:55  LD      vmlinux
> > 09:35:03  SORTEX  vmlinux
> > 09:35:03  SYSMAP  System.map
> > 09:35:04   OBJCOPY arch/arm/boot/Image
> > 
> > The final 'LD' step is much faster here as you also found, and now
> > the time for the complete link is mainly the initial 'LD vmlinux.o'
> > step, which did not get faster with your patch.  
> 
> The info here isn't very good because KSYM is printed after the link
> but before the kallsyms generation. We can kind of see what's happening
> if we take the time between the KSYM and the LD vmlinux as the time for
> kallsyms: 
>                            before    after
> KSYM                      ~12s      ~12s
> LD      vmlinux.o          20s       18s
> LD      .tmp_kallsyms1.o   28s        8s
> LD      .tmp_kallsyms2.o   33s        7s
> LD      vmlinux            33s        8s
> 
> Probably the cortex a8 workaround does not get applied to vmlinux.o
> link (due to being incremental), so we don't see any speedup with the
> patch. It takes longer overall I guess because it keeps a lot of
> symbols in the output file (due to incremental).
> 
> 
> > > Can you send your latest ARM patch to enable this and I'll have a look
> > > at it?    
> > 
> > See below. I have not updated the patch description yet, but included
> > the changes that Nico suggested. The test above used the same patch
> > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.  
> 
> Thanks, I'll take a look.

Okay, I can't reproduce your bad linking times even with gc-sections. It's
possible I'm doing something wrong, but with my patches + your patch and
standard arm allyesconfig:

  20:33:56  AR      built-in.o
  20:33:57  LD      vmlinux.o
  MODPOST vmlinux.o
  20:34:12  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  AR      init/built-in.o
  20:34:24  KSYM    .tmp_kallsyms1.o
  20:34:45  KSYM    .tmp_kallsyms2.o
  20:34:54  LD      vmlinux
  20:35:07  SORTEX  vmlinux
  20:35:07  SYSMAP  System.map

I have about 71 seconds for the final link phase.

Command is:
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux

$ arm-linux-gnueabi-ld -v
GNU ld (GNU Binutils for Debian) 2.27

Confirming function/data sections:
$ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l
1012848

$ size vmlinux
   text	   data	    bss	    dec	    hex	filename
74205771	36746855	19072744	130025370	7c0079a	vmlinux

$ file vmlinux
vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped

Final link command is:
arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o

Which takes 12s

I'm building THUMB2 now, but if you have any hints for me...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-24 10:56           ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24 10:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Alan Modra,
	linux-kbuild, Russell King, Julian Brown, Michal Marek,
	Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Wed, 24 Aug 2016 19:05:30 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Wed, 24 Aug 2016 09:41:39 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:  
> > > On Tue, 23 Aug 2016 14:01:29 +0200
> > > Arnd Bergmann <arnd@arndb.de> wrote:
> > >     
> > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:    
> > > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > > branch trampolines to avoid problematic branch target locations. This
> > > > > results in much higher linking time and presumably slower and larger
> > > > > generated code. The workaround also seems to only be required when
> > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > > well.
> > > > > 
> > > > > The workaround today is left to the linker to apply, which is overly
> > > > > conservative.
> > > > > 
> > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > > 
> > > > > This patch adds an option which defaults to "y" in cases where we
> > > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > > In reality the workaround might not be required at all for the kernel
> > > > > if virtual instruction memory is linear in physical memory. However it
> > > > > is more conservative to keep the workaround, and it may be the case
> > > > > that the TLB lookup would be required in order to catch branches to
> > > > > unmapped or no-execute pages.
> > > > > 
> > > > > In an allyesconfig build, this workaround causes a large load on
> > > > > the linker's branch stub hash and slows down the final link by a
> > > > > factor of 5.
> > > > > 
> > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > >       
> > > > 
> > > > Thanks a lot for finding this issue. I can confirm that your patch
> > > > helps noticeably in all configurations, reducing time for a relink
> > > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > > the factor 10 slowdown of the final link with your thin archives
> > > > and gc-sections patches remains.
> > > > 
> > > > I suspect there is still something else going on besides the 657417
> > > > slowing things down, but it's also possible that I'm doing something
> > > > wrong here.    
> > > 
> > > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > > With thin archives, one final arm allyesconfig link with this patch is
> > > not showing a regression. gc-sections must be causing something else
> > > ARM specific, because powerpc seems to link fast with gc-sections.    
> > 
> > Ok, I see. For completeness, here are my results with thin archives and
> > without gc-sections on ARM:  
> 
> This is about what I saw.
> 
> > 
> > || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> > 
> > 09:29:51   LINK    vmlinux
> > 09:29:51  AR      built-in.o
> > 09:29:52  LD      vmlinux.o
> > 09:30:12   MODPOST vmlinux.o
> > 09:30:14  GEN     .version
> > 09:30:14   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:30:14   CC      init/version.o
> > 09:30:15   AR      init/built-in.o
> > 09:30:43  KSYM    .tmp_kallsyms1.o
> > 09:31:28  KSYM    .tmp_kallsyms2.o
> > 09:31:40  LD      vmlinux
> > 09:32:13  SORTEX  vmlinux
> > 09:32:13  SYSMAP  System.map
> > 09:32:15   OBJCOPY arch/arm/boot/Image
> > 
> > || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> > 
> > 09:33:54   LINK    vmlinux
> > 09:33:54  AR      built-in.o
> > 09:33:55  LD      vmlinux.o
> > 09:34:13   MODPOST vmlinux.o
> > 09:34:15  GEN     .version
> > 09:34:16   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:34:16   CC      init/version.o
> > 09:34:16   AR      init/built-in.o
> > 09:34:24  KSYM    .tmp_kallsyms1.o
> > 09:34:43  KSYM    .tmp_kallsyms2.o
> > 09:34:55  LD      vmlinux
> > 09:35:03  SORTEX  vmlinux
> > 09:35:03  SYSMAP  System.map
> > 09:35:04   OBJCOPY arch/arm/boot/Image
> > 
> > The final 'LD' step is much faster here as you also found, and now
> > the time for the complete link is mainly the initial 'LD vmlinux.o'
> > step, which did not get faster with your patch.  
> 
> The info here isn't very good because KSYM is printed after the link
> but before the kallsyms generation. We can kind of see what's happening
> if we take the time between the KSYM and the LD vmlinux as the time for
> kallsyms: 
>                            before    after
> KSYM                      ~12s      ~12s
> LD      vmlinux.o          20s       18s
> LD      .tmp_kallsyms1.o   28s        8s
> LD      .tmp_kallsyms2.o   33s        7s
> LD      vmlinux            33s        8s
> 
> Probably the cortex a8 workaround does not get applied to vmlinux.o
> link (due to being incremental), so we don't see any speedup with the
> patch. It takes longer overall I guess because it keeps a lot of
> symbols in the output file (due to incremental).
> 
> 
> > > Can you send your latest ARM patch to enable this and I'll have a look
> > > at it?    
> > 
> > See below. I have not updated the patch description yet, but included
> > the changes that Nico suggested. The test above used the same patch
> > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.  
> 
> Thanks, I'll take a look.

Okay, I can't reproduce your bad linking times even with gc-sections. It's
possible I'm doing something wrong, but with my patches + your patch and
standard arm allyesconfig:

  20:33:56  AR      built-in.o
  20:33:57  LD      vmlinux.o
  MODPOST vmlinux.o
  20:34:12  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  AR      init/built-in.o
  20:34:24  KSYM    .tmp_kallsyms1.o
  20:34:45  KSYM    .tmp_kallsyms2.o
  20:34:54  LD      vmlinux
  20:35:07  SORTEX  vmlinux
  20:35:07  SYSMAP  System.map

I have about 71 seconds for the final link phase.

Command is:
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux

$ arm-linux-gnueabi-ld -v
GNU ld (GNU Binutils for Debian) 2.27

Confirming function/data sections:
$ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l
1012848

$ size vmlinux
   text	   data	    bss	    dec	    hex	filename
74205771	36746855	19072744	130025370	7c0079a	vmlinux

$ file vmlinux
vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped

Final link command is:
arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o

Which takes 12s

I'm building THUMB2 now, but if you have any hints for me...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24 10:56           ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24 10:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Aug 2016 19:05:30 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Wed, 24 Aug 2016 09:41:39 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:  
> > > On Tue, 23 Aug 2016 14:01:29 +0200
> > > Arnd Bergmann <arnd@arndb.de> wrote:
> > >     
> > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:    
> > > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > > branch trampolines to avoid problematic branch target locations. This
> > > > > results in much higher linking time and presumably slower and larger
> > > > > generated code. The workaround also seems to only be required when
> > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > > well.
> > > > > 
> > > > > The workaround today is left to the linker to apply, which is overly
> > > > > conservative.
> > > > > 
> > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > > 
> > > > > This patch adds an option which defaults to "y" in cases where we
> > > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > > In reality the workaround might not be required at all for the kernel
> > > > > if virtual instruction memory is linear in physical memory. However it
> > > > > is more conservative to keep the workaround, and it may be the case
> > > > > that the TLB lookup would be required in order to catch branches to
> > > > > unmapped or no-execute pages.
> > > > > 
> > > > > In an allyesconfig build, this workaround causes a large load on
> > > > > the linker's branch stub hash and slows down the final link by a
> > > > > factor of 5.
> > > > > 
> > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > >       
> > > > 
> > > > Thanks a lot for finding this issue. I can confirm that your patch
> > > > helps noticeably in all configurations, reducing time for a relink
> > > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > > the factor 10 slowdown of the final link with your thin archives
> > > > and gc-sections patches remains.
> > > > 
> > > > I suspect there is still something else going on besides the 657417
> > > > slowing things down, but it's also possible that I'm doing something
> > > > wrong here.    
> > > 
> > > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > > With thin archives, one final arm allyesconfig link with this patch is
> > > not showing a regression. gc-sections must be causing something else
> > > ARM specific, because powerpc seems to link fast with gc-sections.    
> > 
> > Ok, I see. For completeness, here are my results with thin archives and
> > without gc-sections on ARM:  
> 
> This is about what I saw.
> 
> > 
> > || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> > 
> > 09:29:51   LINK    vmlinux
> > 09:29:51  AR      built-in.o
> > 09:29:52  LD      vmlinux.o
> > 09:30:12   MODPOST vmlinux.o
> > 09:30:14  GEN     .version
> > 09:30:14   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:30:14   CC      init/version.o
> > 09:30:15   AR      init/built-in.o
> > 09:30:43  KSYM    .tmp_kallsyms1.o
> > 09:31:28  KSYM    .tmp_kallsyms2.o
> > 09:31:40  LD      vmlinux
> > 09:32:13  SORTEX  vmlinux
> > 09:32:13  SYSMAP  System.map
> > 09:32:15   OBJCOPY arch/arm/boot/Image
> > 
> > || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> > 
> > 09:33:54   LINK    vmlinux
> > 09:33:54  AR      built-in.o
> > 09:33:55  LD      vmlinux.o
> > 09:34:13   MODPOST vmlinux.o
> > 09:34:15  GEN     .version
> > 09:34:16   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> > 09:34:16   CC      init/version.o
> > 09:34:16   AR      init/built-in.o
> > 09:34:24  KSYM    .tmp_kallsyms1.o
> > 09:34:43  KSYM    .tmp_kallsyms2.o
> > 09:34:55  LD      vmlinux
> > 09:35:03  SORTEX  vmlinux
> > 09:35:03  SYSMAP  System.map
> > 09:35:04   OBJCOPY arch/arm/boot/Image
> > 
> > The final 'LD' step is much faster here as you also found, and now
> > the time for the complete link is mainly the initial 'LD vmlinux.o'
> > step, which did not get faster with your patch.  
> 
> The info here isn't very good because KSYM is printed after the link
> but before the kallsyms generation. We can kind of see what's happening
> if we take the time between the KSYM and the LD vmlinux as the time for
> kallsyms: 
>                            before    after
> KSYM                      ~12s      ~12s
> LD      vmlinux.o          20s       18s
> LD      .tmp_kallsyms1.o   28s        8s
> LD      .tmp_kallsyms2.o   33s        7s
> LD      vmlinux            33s        8s
> 
> Probably the cortex a8 workaround does not get applied to vmlinux.o
> link (due to being incremental), so we don't see any speedup with the
> patch. It takes longer overall I guess because it keeps a lot of
> symbols in the output file (due to incremental).
> 
> 
> > > Can you send your latest ARM patch to enable this and I'll have a look
> > > at it?    
> > 
> > See below. I have not updated the patch description yet, but included
> > the changes that Nico suggested. The test above used the same patch
> > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.  
> 
> Thanks, I'll take a look.

Okay, I can't reproduce your bad linking times even with gc-sections. It's
possible I'm doing something wrong, but with my patches + your patch and
standard arm allyesconfig:

  20:33:56  AR      built-in.o
  20:33:57  LD      vmlinux.o
  MODPOST vmlinux.o
  20:34:12  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  AR      init/built-in.o
  20:34:24  KSYM    .tmp_kallsyms1.o
  20:34:45  KSYM    .tmp_kallsyms2.o
  20:34:54  LD      vmlinux
  20:35:07  SORTEX  vmlinux
  20:35:07  SYSMAP  System.map

I have about 71 seconds for the final link phase.

Command is:
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux

$ arm-linux-gnueabi-ld -v
GNU ld (GNU Binutils for Debian) 2.27

Confirming function/data sections:
$ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l
1012848

$ size vmlinux
   text	   data	    bss	    dec	    hex	filename
74205771	36746855	19072744	130025370	7c0079a	vmlinux

$ file vmlinux
vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped

Final link command is:
arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o

Which takes 12s

I'm building THUMB2 now, but if you have any hints for me...

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24 10:56           ` Nicholas Piggin
@ 2016-08-24 12:06             ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24 12:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wed, 24 Aug 2016 20:56:46 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Wed, 24 Aug 2016 19:05:30 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > On Wed, 24 Aug 2016 09:41:39 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >   
> > > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:    
> > > > On Tue, 23 Aug 2016 14:01:29 +0200
> > > > Arnd Bergmann <arnd@arndb.de> wrote:
> > > >       
> > > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:      
> > > > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > > > branch trampolines to avoid problematic branch target locations. This
> > > > > > results in much higher linking time and presumably slower and larger
> > > > > > generated code. The workaround also seems to only be required when
> > > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > > > well.
> > > > > > 
> > > > > > The workaround today is left to the linker to apply, which is overly
> > > > > > conservative.
> > > > > > 
> > > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > > > 
> > > > > > This patch adds an option which defaults to "y" in cases where we
> > > > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > > > In reality the workaround might not be required at all for the kernel
> > > > > > if virtual instruction memory is linear in physical memory. However it
> > > > > > is more conservative to keep the workaround, and it may be the case
> > > > > > that the TLB lookup would be required in order to catch branches to
> > > > > > unmapped or no-execute pages.
> > > > > > 
> > > > > > In an allyesconfig build, this workaround causes a large load on
> > > > > > the linker's branch stub hash and slows down the final link by a
> > > > > > factor of 5.
> > > > > > 
> > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > > >         
> > > > > 
> > > > > Thanks a lot for finding this issue. I can confirm that your patch
> > > > > helps noticeably in all configurations, reducing time for a relink
> > > > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > > > the factor 10 slowdown of the final link with your thin archives
> > > > > and gc-sections patches remains.
> > > > > 
> > > > > I suspect there is still something else going on besides the 657417
> > > > > slowing things down, but it's also possible that I'm doing something
> > > > > wrong here.      
> > > > 
> > > > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > > > With thin archives, one final arm allyesconfig link with this patch is
> > > > not showing a regression. gc-sections must be causing something else
> > > > ARM specific, because powerpc seems to link fast with gc-sections.      
> > > 
> > > Ok, I see. For completeness, here are my results with thin archives and
> > > without gc-sections on ARM:    
> > 
> > This is about what I saw.
> >   
> > > 
> > > || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> > > 
> > > 09:29:51   LINK    vmlinux
> > > 09:29:51  AR      built-in.o
> > > 09:29:52  LD      vmlinux.o
> > > 09:30:12   MODPOST vmlinux.o
> > > 09:30:14  GEN     .version
> > > 09:30:14   CHK     include/generated/compile.h
> > >   UPD     include/generated/compile.h
> > > 09:30:14   CC      init/version.o
> > > 09:30:15   AR      init/built-in.o
> > > 09:30:43  KSYM    .tmp_kallsyms1.o
> > > 09:31:28  KSYM    .tmp_kallsyms2.o
> > > 09:31:40  LD      vmlinux
> > > 09:32:13  SORTEX  vmlinux
> > > 09:32:13  SYSMAP  System.map
> > > 09:32:15   OBJCOPY arch/arm/boot/Image
> > > 
> > > || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> > > 
> > > 09:33:54   LINK    vmlinux
> > > 09:33:54  AR      built-in.o
> > > 09:33:55  LD      vmlinux.o
> > > 09:34:13   MODPOST vmlinux.o
> > > 09:34:15  GEN     .version
> > > 09:34:16   CHK     include/generated/compile.h
> > >   UPD     include/generated/compile.h
> > > 09:34:16   CC      init/version.o
> > > 09:34:16   AR      init/built-in.o
> > > 09:34:24  KSYM    .tmp_kallsyms1.o
> > > 09:34:43  KSYM    .tmp_kallsyms2.o
> > > 09:34:55  LD      vmlinux
> > > 09:35:03  SORTEX  vmlinux
> > > 09:35:03  SYSMAP  System.map
> > > 09:35:04   OBJCOPY arch/arm/boot/Image
> > > 
> > > The final 'LD' step is much faster here as you also found, and now
> > > the time for the complete link is mainly the initial 'LD vmlinux.o'
> > > step, which did not get faster with your patch.    
> > 
> > The info here isn't very good because KSYM is printed after the link
> > but before the kallsyms generation. We can kind of see what's happening
> > if we take the time between the KSYM and the LD vmlinux as the time for
> > kallsyms: 
> >                            before    after
> > KSYM                      ~12s      ~12s
> > LD      vmlinux.o          20s       18s
> > LD      .tmp_kallsyms1.o   28s        8s
> > LD      .tmp_kallsyms2.o   33s        7s
> > LD      vmlinux            33s        8s
> > 
> > Probably the cortex a8 workaround does not get applied to vmlinux.o
> > link (due to being incremental), so we don't see any speedup with the
> > patch. It takes longer overall I guess because it keeps a lot of
> > symbols in the output file (due to incremental).
> > 
> >   
> > > > Can you send your latest ARM patch to enable this and I'll have a look
> > > > at it?      
> > > 
> > > See below. I have not updated the patch description yet, but included
> > > the changes that Nico suggested. The test above used the same patch
> > > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.    
> > 
> > Thanks, I'll take a look.  
> 
> Okay, I can't reproduce your bad linking times even with gc-sections. It's
> possible I'm doing something wrong, but with my patches + your patch and
> standard arm allyesconfig:
> 
>   20:33:56  AR      built-in.o
>   20:33:57  LD      vmlinux.o
>   MODPOST vmlinux.o
>   20:34:12  GEN     .version
>   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
>   CC      init/version.o
>   AR      init/built-in.o
>   20:34:24  KSYM    .tmp_kallsyms1.o
>   20:34:45  KSYM    .tmp_kallsyms2.o
>   20:34:54  LD      vmlinux
>   20:35:07  SORTEX  vmlinux
>   20:35:07  SYSMAP  System.map
> 
> I have about 71 seconds for the final link phase.
> 
> Command is:
> make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux
> 
> $ arm-linux-gnueabi-ld -v
> GNU ld (GNU Binutils for Debian) 2.27
> 
> Confirming function/data sections:
> $ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l
> 1012848
> 
> $ size vmlinux
>    text	   data	    bss	    dec	    hex	filename
> 74205771	36746855	19072744	130025370	7c0079a	vmlinux
> 
> $ file vmlinux
> vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped
> 
> Final link command is:
> arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o
> 
> Which takes 12s
> 
> I'm building THUMB2 now, but if you have any hints for me...

Just did a thumb2 build. Disable V6, enable thumb2, otherwise same config
and tree takes a long time to link as expected because the
--no-fix-cortex-a8 option is not being applied. Adding that brings link
time down to same as allyesconfig (slightly faster).

Confirmed it's output thumb2 instructions.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24 12:06             ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-24 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Aug 2016 20:56:46 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Wed, 24 Aug 2016 19:05:30 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > On Wed, 24 Aug 2016 09:41:39 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >   
> > > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote:    
> > > > On Tue, 23 Aug 2016 14:01:29 +0200
> > > > Arnd Bergmann <arnd@arndb.de> wrote:
> > > >       
> > > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote:      
> > > > > > Erratum 657417 is worked around by the linker by inserting additional
> > > > > > branch trampolines to avoid problematic branch target locations. This
> > > > > > results in much higher linking time and presumably slower and larger
> > > > > > generated code. The workaround also seems to only be required when
> > > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as
> > > > > > well.
> > > > > > 
> > > > > > The workaround today is left to the linker to apply, which is overly
> > > > > > conservative.
> > > > > > 
> > > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html
> > > > > > 
> > > > > > This patch adds an option which defaults to "y" in cases where we
> > > > > > could possibly be running Cortex A8 and using Thumb2 instructions.
> > > > > > In reality the workaround might not be required at all for the kernel
> > > > > > if virtual instruction memory is linear in physical memory. However it
> > > > > > is more conservative to keep the workaround, and it may be the case
> > > > > > that the TLB lookup would be required in order to catch branches to
> > > > > > unmapped or no-execute pages.
> > > > > > 
> > > > > > In an allyesconfig build, this workaround causes a large load on
> > > > > > the linker's branch stub hash and slows down the final link by a
> > > > > > factor of 5.
> > > > > > 
> > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > > >         
> > > > > 
> > > > > Thanks a lot for finding this issue. I can confirm that your patch
> > > > > helps noticeably in all configurations, reducing time for a relink
> > > > > from 18 minutes to 9 minutes on my machine in the best case, but
> > > > > the factor 10 slowdown of the final link with your thin archives
> > > > > and gc-sections patches remains.
> > > > > 
> > > > > I suspect there is still something else going on besides the 657417
> > > > > slowing things down, but it's also possible that I'm doing something
> > > > > wrong here.      
> > > > 
> > > > Okay, I was only testing thin archives. gc-sections I didn't look at.
> > > > With thin archives, one final arm allyesconfig link with this patch is
> > > > not showing a regression. gc-sections must be causing something else
> > > > ARM specific, because powerpc seems to link fast with gc-sections.      
> > > 
> > > Ok, I see. For completeness, here are my results with thin archives and
> > > without gc-sections on ARM:    
> > 
> > This is about what I saw.
> >   
> > > 
> > > || no THUMB2, thin archives, no gc-sections, before: 144 seconds
> > > 
> > > 09:29:51   LINK    vmlinux
> > > 09:29:51  AR      built-in.o
> > > 09:29:52  LD      vmlinux.o
> > > 09:30:12   MODPOST vmlinux.o
> > > 09:30:14  GEN     .version
> > > 09:30:14   CHK     include/generated/compile.h
> > >   UPD     include/generated/compile.h
> > > 09:30:14   CC      init/version.o
> > > 09:30:15   AR      init/built-in.o
> > > 09:30:43  KSYM    .tmp_kallsyms1.o
> > > 09:31:28  KSYM    .tmp_kallsyms2.o
> > > 09:31:40  LD      vmlinux
> > > 09:32:13  SORTEX  vmlinux
> > > 09:32:13  SYSMAP  System.map
> > > 09:32:15   OBJCOPY arch/arm/boot/Image
> > > 
> > > || no THUMB2, thin archives, no gc-sections, after: 70 seconds
> > > 
> > > 09:33:54   LINK    vmlinux
> > > 09:33:54  AR      built-in.o
> > > 09:33:55  LD      vmlinux.o
> > > 09:34:13   MODPOST vmlinux.o
> > > 09:34:15  GEN     .version
> > > 09:34:16   CHK     include/generated/compile.h
> > >   UPD     include/generated/compile.h
> > > 09:34:16   CC      init/version.o
> > > 09:34:16   AR      init/built-in.o
> > > 09:34:24  KSYM    .tmp_kallsyms1.o
> > > 09:34:43  KSYM    .tmp_kallsyms2.o
> > > 09:34:55  LD      vmlinux
> > > 09:35:03  SORTEX  vmlinux
> > > 09:35:03  SYSMAP  System.map
> > > 09:35:04   OBJCOPY arch/arm/boot/Image
> > > 
> > > The final 'LD' step is much faster here as you also found, and now
> > > the time for the complete link is mainly the initial 'LD vmlinux.o'
> > > step, which did not get faster with your patch.    
> > 
> > The info here isn't very good because KSYM is printed after the link
> > but before the kallsyms generation. We can kind of see what's happening
> > if we take the time between the KSYM and the LD vmlinux as the time for
> > kallsyms: 
> >                            before    after
> > KSYM                      ~12s      ~12s
> > LD      vmlinux.o          20s       18s
> > LD      .tmp_kallsyms1.o   28s        8s
> > LD      .tmp_kallsyms2.o   33s        7s
> > LD      vmlinux            33s        8s
> > 
> > Probably the cortex a8 workaround does not get applied to vmlinux.o
> > link (due to being incremental), so we don't see any speedup with the
> > patch. It takes longer overall I guess because it keeps a lot of
> > symbols in the output file (due to incremental).
> > 
> >   
> > > > Can you send your latest ARM patch to enable this and I'll have a look
> > > > at it?      
> > > 
> > > See below. I have not updated the patch description yet, but included
> > > the changes that Nico suggested. The test above used the same patch
> > > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line.    
> > 
> > Thanks, I'll take a look.  
> 
> Okay, I can't reproduce your bad linking times even with gc-sections. It's
> possible I'm doing something wrong, but with my patches + your patch and
> standard arm allyesconfig:
> 
>   20:33:56  AR      built-in.o
>   20:33:57  LD      vmlinux.o
>   MODPOST vmlinux.o
>   20:34:12  GEN     .version
>   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
>   CC      init/version.o
>   AR      init/built-in.o
>   20:34:24  KSYM    .tmp_kallsyms1.o
>   20:34:45  KSYM    .tmp_kallsyms2.o
>   20:34:54  LD      vmlinux
>   20:35:07  SORTEX  vmlinux
>   20:35:07  SYSMAP  System.map
> 
> I have about 71 seconds for the final link phase.
> 
> Command is:
> make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux
> 
> $ arm-linux-gnueabi-ld -v
> GNU ld (GNU Binutils for Debian) 2.27
> 
> Confirming function/data sections:
> $ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l
> 1012848
> 
> $ size vmlinux
>    text	   data	    bss	    dec	    hex	filename
> 74205771	36746855	19072744	130025370	7c0079a	vmlinux
> 
> $ file vmlinux
> vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped
> 
> Final link command is:
> arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o
> 
> Which takes 12s
> 
> I'm building THUMB2 now, but if you have any hints for me...

Just did a thumb2 build. Disable V6, enable thumb2, otherwise same config
and tree takes a long time to link as expected because the
--no-fix-cortex-a8 option is not being applied. Adding that brings link
time down to same as allyesconfig (slightly faster).

Confirmed it's output thumb2 instructions.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24 10:56           ` Nicholas Piggin
@ 2016-08-24 15:01             ` Arnd Bergmann
  -1 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-24 15:01 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote:
> On Wed, 24 Aug 2016 19:05:30 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> > Thanks, I'll take a look.
> 
> Okay, I can't reproduce your bad linking times even with gc-sections. It's
> possible I'm doing something wrong, but with my patches + your patch and
> standard arm allyesconfig:
> 
>   20:33:56  AR      built-in.o
>   20:33:57  LD      vmlinux.o
>   MODPOST vmlinux.o
>   20:34:12  GEN     .version
>   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
>   CC      init/version.o
>   AR      init/built-in.o
>   20:34:24  KSYM    .tmp_kallsyms1.o
>   20:34:45  KSYM    .tmp_kallsyms2.o
>   20:34:54  LD      vmlinux
>   20:35:07  SORTEX  vmlinux
>   20:35:07  SYSMAP  System.map
> 
> I have about 71 seconds for the final link phase.


I've tracked down my remaining build time regression to
a bad binutils snapshot (2.26.51) I had been using, and upgraded
to the 2.27 release now, which is roughly the same as what
you have:

14:45:41   LINK    vmlinux
14:45:41  AR      built-in.o
14:45:42  LD      vmlinux.o
14:51:49   MODPOST vmlinux.o
14:51:51  GEN     .version
14:51:51   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
14:51:51   CC      init/version.o
14:51:52   AR      init/built-in.o
14:52:04  KSYM    .tmp_kallsyms1.o
14:52:31  KSYM    .tmp_kallsyms2.o
14:52:43  LD      vmlinux
14:52:55  SORTEX  vmlinux
14:52:55  SYSMAP  System.map
14:52:56   OBJCOPY arch/arm/boot/Image

The long minutes that were spent in "arm-linux-gnueabi-ld 
-r -o vmlinux.o --whole-archive built-in.o" are all gone now.

I still see a problem with big-endian builds failing with
thinarc/gc-sections, I'll investigate that some other day,
or you could have a look at that if you want to make sure
it's an ARM specific problem, not something with your
patches in general.

The patch that I sent for enabling the two on ARM blocks
out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
the problem. It's possible that it only breaks when doing
a big-endian build after a little-endian build without
a "make clean" inbetween.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-24 15:01             ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-24 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote:
> On Wed, 24 Aug 2016 19:05:30 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> > Thanks, I'll take a look.
> 
> Okay, I can't reproduce your bad linking times even with gc-sections. It's
> possible I'm doing something wrong, but with my patches + your patch and
> standard arm allyesconfig:
> 
>   20:33:56  AR      built-in.o
>   20:33:57  LD      vmlinux.o
>   MODPOST vmlinux.o
>   20:34:12  GEN     .version
>   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
>   CC      init/version.o
>   AR      init/built-in.o
>   20:34:24  KSYM    .tmp_kallsyms1.o
>   20:34:45  KSYM    .tmp_kallsyms2.o
>   20:34:54  LD      vmlinux
>   20:35:07  SORTEX  vmlinux
>   20:35:07  SYSMAP  System.map
> 
> I have about 71 seconds for the final link phase.


I've tracked down my remaining build time regression to
a bad binutils snapshot (2.26.51) I had been using, and upgraded
to the 2.27 release now, which is roughly the same as what
you have:

14:45:41   LINK    vmlinux
14:45:41  AR      built-in.o
14:45:42  LD      vmlinux.o
14:51:49   MODPOST vmlinux.o
14:51:51  GEN     .version
14:51:51   CHK     include/generated/compile.h
  UPD     include/generated/compile.h
14:51:51   CC      init/version.o
14:51:52   AR      init/built-in.o
14:52:04  KSYM    .tmp_kallsyms1.o
14:52:31  KSYM    .tmp_kallsyms2.o
14:52:43  LD      vmlinux
14:52:55  SORTEX  vmlinux
14:52:55  SYSMAP  System.map
14:52:56   OBJCOPY arch/arm/boot/Image

The long minutes that were spent in "arm-linux-gnueabi-ld 
-r -o vmlinux.o --whole-archive built-in.o" are all gone now.

I still see a problem with big-endian builds failing with
thinarc/gc-sections, I'll investigate that some other day,
or you could have a look at that if you want to make sure
it's an ARM specific problem, not something with your
patches in general.

The patch that I sent for enabling the two on ARM blocks
out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
the problem. It's possible that it only breaks when doing
a big-endian build after a little-endian build without
a "make clean" inbetween.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-24 15:01             ` Arnd Bergmann
  (?)
@ 2016-08-26 11:17               ` Nicholas Piggin
  -1 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-26 11:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-kbuild, Nicolas Pitre,
	Julian Brown, Alan Modra, Segher Boessenkool, linux-arch,
	Michal Marek, Sam Ravnborg, Stephen Rothwell

On Wed, 24 Aug 2016 17:01:30 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote:
> > On Wed, 24 Aug 2016 19:05:30 +1000
> > Nicholas Piggin <npiggin@gmail.com> wrote:  
> > > Thanks, I'll take a look.  
> > 
> > Okay, I can't reproduce your bad linking times even with gc-sections. It's
> > possible I'm doing something wrong, but with my patches + your patch and
> > standard arm allyesconfig:
> > 
> >   20:33:56  AR      built-in.o
> >   20:33:57  LD      vmlinux.o
> >   MODPOST vmlinux.o
> >   20:34:12  GEN     .version
> >   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> >   CC      init/version.o
> >   AR      init/built-in.o
> >   20:34:24  KSYM    .tmp_kallsyms1.o
> >   20:34:45  KSYM    .tmp_kallsyms2.o
> >   20:34:54  LD      vmlinux
> >   20:35:07  SORTEX  vmlinux
> >   20:35:07  SYSMAP  System.map
> > 
> > I have about 71 seconds for the final link phase.  
> 
> 
> I've tracked down my remaining build time regression to
> a bad binutils snapshot (2.26.51) I had been using, and upgraded
> to the 2.27 release now, which is roughly the same as what
> you have:
> 
> 14:45:41   LINK    vmlinux
> 14:45:41  AR      built-in.o
> 14:45:42  LD      vmlinux.o
> 14:51:49   MODPOST vmlinux.o
> 14:51:51  GEN     .version
> 14:51:51   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 14:51:51   CC      init/version.o
> 14:51:52   AR      init/built-in.o
> 14:52:04  KSYM    .tmp_kallsyms1.o
> 14:52:31  KSYM    .tmp_kallsyms2.o
> 14:52:43  LD      vmlinux
> 14:52:55  SORTEX  vmlinux
> 14:52:55  SYSMAP  System.map
> 14:52:56   OBJCOPY arch/arm/boot/Image
> 
> The long minutes that were spent in "arm-linux-gnueabi-ld 
> -r -o vmlinux.o --whole-archive built-in.o" are all gone now.

Thanks for tracking it down, that's good to hear.


> I still see a problem with big-endian builds failing with
> thinarc/gc-sections, I'll investigate that some other day,
> or you could have a look at that if you want to make sure
> it's an ARM specific problem, not something with your
> patches in general.
> 
> The patch that I sent for enabling the two on ARM blocks
> out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> the problem. It's possible that it only breaks when doing
> a big-endian build after a little-endian build without
> a "make clean" inbetween.

I'm able to build big endian ARM allyesconfig with thin
archives and gc-sections, worked fine.

There have been a few bugs in powerpc when failing to notice
the change when endian of builds was swapped, so maybe you
got bitten by something similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-26 11:17               ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-26 11:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, Alan Modra,
	linux-kbuild, Russell King, Julian Brown, Michal Marek,
	Stephen Rothwell, Sam Ravnborg, linux-arm-kernel

On Wed, 24 Aug 2016 17:01:30 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote:
> > On Wed, 24 Aug 2016 19:05:30 +1000
> > Nicholas Piggin <npiggin@gmail.com> wrote:  
> > > Thanks, I'll take a look.  
> > 
> > Okay, I can't reproduce your bad linking times even with gc-sections. It's
> > possible I'm doing something wrong, but with my patches + your patch and
> > standard arm allyesconfig:
> > 
> >   20:33:56  AR      built-in.o
> >   20:33:57  LD      vmlinux.o
> >   MODPOST vmlinux.o
> >   20:34:12  GEN     .version
> >   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> >   CC      init/version.o
> >   AR      init/built-in.o
> >   20:34:24  KSYM    .tmp_kallsyms1.o
> >   20:34:45  KSYM    .tmp_kallsyms2.o
> >   20:34:54  LD      vmlinux
> >   20:35:07  SORTEX  vmlinux
> >   20:35:07  SYSMAP  System.map
> > 
> > I have about 71 seconds for the final link phase.  
> 
> 
> I've tracked down my remaining build time regression to
> a bad binutils snapshot (2.26.51) I had been using, and upgraded
> to the 2.27 release now, which is roughly the same as what
> you have:
> 
> 14:45:41   LINK    vmlinux
> 14:45:41  AR      built-in.o
> 14:45:42  LD      vmlinux.o
> 14:51:49   MODPOST vmlinux.o
> 14:51:51  GEN     .version
> 14:51:51   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 14:51:51   CC      init/version.o
> 14:51:52   AR      init/built-in.o
> 14:52:04  KSYM    .tmp_kallsyms1.o
> 14:52:31  KSYM    .tmp_kallsyms2.o
> 14:52:43  LD      vmlinux
> 14:52:55  SORTEX  vmlinux
> 14:52:55  SYSMAP  System.map
> 14:52:56   OBJCOPY arch/arm/boot/Image
> 
> The long minutes that were spent in "arm-linux-gnueabi-ld 
> -r -o vmlinux.o --whole-archive built-in.o" are all gone now.

Thanks for tracking it down, that's good to hear.


> I still see a problem with big-endian builds failing with
> thinarc/gc-sections, I'll investigate that some other day,
> or you could have a look at that if you want to make sure
> it's an ARM specific problem, not something with your
> patches in general.
> 
> The patch that I sent for enabling the two on ARM blocks
> out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> the problem. It's possible that it only breaks when doing
> a big-endian build after a little-endian build without
> a "make clean" inbetween.

I'm able to build big endian ARM allyesconfig with thin
archives and gc-sections, worked fine.

There have been a few bugs in powerpc when failing to notice
the change when endian of builds was swapped, so maybe you
got bitten by something similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-26 11:17               ` Nicholas Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nicholas Piggin @ 2016-08-26 11:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Aug 2016 17:01:30 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote:
> > On Wed, 24 Aug 2016 19:05:30 +1000
> > Nicholas Piggin <npiggin@gmail.com> wrote:  
> > > Thanks, I'll take a look.  
> > 
> > Okay, I can't reproduce your bad linking times even with gc-sections. It's
> > possible I'm doing something wrong, but with my patches + your patch and
> > standard arm allyesconfig:
> > 
> >   20:33:56  AR      built-in.o
> >   20:33:57  LD      vmlinux.o
> >   MODPOST vmlinux.o
> >   20:34:12  GEN     .version
> >   CHK     include/generated/compile.h
> >   UPD     include/generated/compile.h
> >   CC      init/version.o
> >   AR      init/built-in.o
> >   20:34:24  KSYM    .tmp_kallsyms1.o
> >   20:34:45  KSYM    .tmp_kallsyms2.o
> >   20:34:54  LD      vmlinux
> >   20:35:07  SORTEX  vmlinux
> >   20:35:07  SYSMAP  System.map
> > 
> > I have about 71 seconds for the final link phase.  
> 
> 
> I've tracked down my remaining build time regression to
> a bad binutils snapshot (2.26.51) I had been using, and upgraded
> to the 2.27 release now, which is roughly the same as what
> you have:
> 
> 14:45:41   LINK    vmlinux
> 14:45:41  AR      built-in.o
> 14:45:42  LD      vmlinux.o
> 14:51:49   MODPOST vmlinux.o
> 14:51:51  GEN     .version
> 14:51:51   CHK     include/generated/compile.h
>   UPD     include/generated/compile.h
> 14:51:51   CC      init/version.o
> 14:51:52   AR      init/built-in.o
> 14:52:04  KSYM    .tmp_kallsyms1.o
> 14:52:31  KSYM    .tmp_kallsyms2.o
> 14:52:43  LD      vmlinux
> 14:52:55  SORTEX  vmlinux
> 14:52:55  SYSMAP  System.map
> 14:52:56   OBJCOPY arch/arm/boot/Image
> 
> The long minutes that were spent in "arm-linux-gnueabi-ld 
> -r -o vmlinux.o --whole-archive built-in.o" are all gone now.

Thanks for tracking it down, that's good to hear.


> I still see a problem with big-endian builds failing with
> thinarc/gc-sections, I'll investigate that some other day,
> or you could have a look at that if you want to make sure
> it's an ARM specific problem, not something with your
> patches in general.
> 
> The patch that I sent for enabling the two on ARM blocks
> out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> the problem. It's possible that it only breaks when doing
> a big-endian build after a little-endian build without
> a "make clean" inbetween.

I'm able to build big endian ARM allyesconfig with thin
archives and gc-sections, worked fine.

There have been a few bugs in powerpc when failing to notice
the change when endian of builds was swapped, so maybe you
got bitten by something similar.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
  2016-08-26 11:17               ` Nicholas Piggin
  (?)
@ 2016-08-26 14:07                 ` Arnd Bergmann
  -1 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-26 14:07 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicholas Piggin, Nicolas Pitre, linux-arch, Segher Boessenkool,
	Alan Modra, linux-kbuild, Russell King, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg

On Friday 26 August 2016, Nicholas Piggin wrote:
> > I still see a problem with big-endian builds failing with
> > thinarc/gc-sections, I'll investigate that some other day,
> > or you could have a look at that if you want to make sure
> > it's an ARM specific problem, not something with your
> > patches in general.
> > 
> > The patch that I sent for enabling the two on ARM blocks
> > out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> > the problem. It's possible that it only breaks when doing
> > a big-endian build after a little-endian build without
> > a "make clean" inbetween.
> 
> I'm able to build big endian ARM allyesconfig with thin
> archives and gc-sections, worked fine.
> 
> There have been a few bugs in powerpc when failing to notice
> the change when endian of builds was swapped, so maybe you
> got bitten by something similar.

I tracked this down as well now, and it's also a problem on my
local machine, your patches are fine. In order to debug
the other problem, I was building with
"make LD=/home/arnd/..../ld" to try out different
versions of the linker, and that caused the "LD += -EB"
line from arch/arm/Makefile to be ignored.

We should probably override LDFLAGS rather than LD
(and AFLAGS instead of AS) for big-endian builds, but
that is unrelated to your work.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] arm: add an option for erratum 657417
@ 2016-08-26 14:07                 ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-26 14:07 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, linux-arch, Segher Boessenkool, linux-kbuild,
	Alan Modra, Russell King, Nicholas Piggin, Julian Brown,
	Michal Marek, Stephen Rothwell, Sam Ravnborg

On Friday 26 August 2016, Nicholas Piggin wrote:
> > I still see a problem with big-endian builds failing with
> > thinarc/gc-sections, I'll investigate that some other day,
> > or you could have a look at that if you want to make sure
> > it's an ARM specific problem, not something with your
> > patches in general.
> > 
> > The patch that I sent for enabling the two on ARM blocks
> > out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> > the problem. It's possible that it only breaks when doing
> > a big-endian build after a little-endian build without
> > a "make clean" inbetween.
> 
> I'm able to build big endian ARM allyesconfig with thin
> archives and gc-sections, worked fine.
> 
> There have been a few bugs in powerpc when failing to notice
> the change when endian of builds was swapped, so maybe you
> got bitten by something similar.

I tracked this down as well now, and it's also a problem on my
local machine, your patches are fine. In order to debug
the other problem, I was building with
"make LD=/home/arnd/..../ld" to try out different
versions of the linker, and that caused the "LD += -EB"
line from arch/arm/Makefile to be ignored.

We should probably override LDFLAGS rather than LD
(and AFLAGS instead of AS) for big-endian builds, but
that is unrelated to your work.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] arm: add an option for erratum 657417
@ 2016-08-26 14:07                 ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2016-08-26 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 26 August 2016, Nicholas Piggin wrote:
> > I still see a problem with big-endian builds failing with
> > thinarc/gc-sections, I'll investigate that some other day,
> > or you could have a look at that if you want to make sure
> > it's an ARM specific problem, not something with your
> > patches in general.
> > 
> > The patch that I sent for enabling the two on ARM blocks
> > out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see
> > the problem. It's possible that it only breaks when doing
> > a big-endian build after a little-endian build without
> > a "make clean" inbetween.
> 
> I'm able to build big endian ARM allyesconfig with thin
> archives and gc-sections, worked fine.
> 
> There have been a few bugs in powerpc when failing to notice
> the change when endian of builds was swapped, so maybe you
> got bitten by something similar.

I tracked this down as well now, and it's also a problem on my
local machine, your patches are fine. In order to debug
the other problem, I was building with
"make LD=/home/arnd/..../ld" to try out different
versions of the linker, and that caused the "LD += -EB"
line from arch/arm/Makefile to be ignored.

We should probably override LDFLAGS rather than LD
(and AFLAGS instead of AS) for big-endian builds, but
that is unrelated to your work.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-08-26 14:09 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12  8:19 [PATCH] arm: add an option for erratum 657417 Nicholas Piggin
2016-08-12  8:19 ` Nicholas Piggin
2016-08-12 12:33 ` Russell King - ARM Linux
2016-08-12 12:33   ` Russell King - ARM Linux
2016-08-12 13:15   ` Nicholas Piggin
2016-08-12 13:15     ` Nicholas Piggin
2016-08-12 13:49     ` Ard Biesheuvel
2016-08-12 13:49       ` Ard Biesheuvel
2016-08-12 13:50       ` Ard Biesheuvel
2016-08-12 13:50         ` Ard Biesheuvel
2016-08-12 13:50         ` Ard Biesheuvel
2016-08-12 13:52         ` Russell King - ARM Linux
2016-08-12 13:52           ` Russell King - ARM Linux
2016-08-12 13:54           ` Ard Biesheuvel
2016-08-12 13:54             ` Ard Biesheuvel
2016-08-12 13:51       ` Russell King - ARM Linux
2016-08-12 13:51         ` Russell King - ARM Linux
2016-08-12 14:17   ` Robin Murphy
2016-08-12 14:17     ` Robin Murphy
2016-08-12 14:23     ` Russell King - ARM Linux
2016-08-12 14:23       ` Russell King - ARM Linux
2016-08-12 14:32       ` Russell King - ARM Linux
2016-08-12 14:32         ` Russell King - ARM Linux
2016-08-12 14:32         ` Russell King - ARM Linux
2016-08-23 12:01 ` Arnd Bergmann
2016-08-23 12:01   ` Arnd Bergmann
2016-08-24  4:00   ` Nicholas Piggin
2016-08-24  4:00     ` Nicholas Piggin
2016-08-24  7:41     ` Arnd Bergmann
2016-08-24  7:41       ` Arnd Bergmann
2016-08-24  7:41       ` Arnd Bergmann
2016-08-24  9:05       ` Nicholas Piggin
2016-08-24  9:05         ` Nicholas Piggin
2016-08-24 10:56         ` Nicholas Piggin
2016-08-24 10:56           ` Nicholas Piggin
2016-08-24 10:56           ` Nicholas Piggin
2016-08-24 12:06           ` Nicholas Piggin
2016-08-24 12:06             ` Nicholas Piggin
2016-08-24 15:01           ` Arnd Bergmann
2016-08-24 15:01             ` Arnd Bergmann
2016-08-26 11:17             ` Nicholas Piggin
2016-08-26 11:17               ` Nicholas Piggin
2016-08-26 11:17               ` Nicholas Piggin
2016-08-26 14:07               ` Arnd Bergmann
2016-08-26 14:07                 ` Arnd Bergmann
2016-08-26 14:07                 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.