All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Clean up ARM compressed loader
@ 2010-02-24  2:23 Hector Martin
  2010-02-24  8:51 ` Uwe Kleine-König
  2010-02-24 22:34 ` Russell King - ARM Linux
  0 siblings, 2 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-24  2:23 UTC (permalink / raw)
  To: linux-arm-kernel

The -Dstatic= stuff in the Makefile is a hack and subtly breaks the
current inflate implementation, because the fixed inflate tables are
included into a function and removing static places them in the stack,
which bloats the stack and breaks references after the function returns.
So get rid of the hack.

Instead, clean up the stub loader and unify it. The loader is now
exactly the same regardless of whether you've enabled CONFIG_ZBOOT_ROM
or not, and runs from RAM in PIC mode. CONFIG_ZBOOT_ROM simply enables
an extra section of stub code that lets the zImage boot from ROM by
copying itself to RAM first.

The old TEXT/BSS settings for the ROM mode are replaced by
CONFIG_ZBOOT_ROM_START and CONFIG_ZBOOT_ROM_END. These simply define a
ROM address space area that lets the loader know when it should copy
itself to RAM. The zImage can be burned into ROM/Flash anywhere within
this space - it's still relocatable. It will also run from RAM.

When running from ROM, the copy is done to the kernel base address. This
gets overwritten with the decompressed kernel later by the existing
relocation code that's already used for RAM zImages loaded where the
final kernel should be. This isn't fully optimal, but the code needs to
be there anyway for RAM use, and this way we don't have to worry about
finding a suitable non-overlapping region of RAM to copy the zImage.

The disadvantage of copying zImage to RAM is requiring enough RAM to fit
both the compressed and uncompressed kernel image. This shouldn't be a
problem in practical systems.

This is untested on a real ROM platform, but has been tested using RAM
as fake ROM.

Signed-off-by: Hector Martin <hector@marcansoft.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
---
 arch/arm/Kconfig                        |   35 +++---
 arch/arm/boot/Makefile                  |    2 +-
 arch/arm/boot/compressed/.gitignore     |    1 -
 arch/arm/boot/compressed/Makefile       |   27 +-----
 arch/arm/boot/compressed/head.S         |  173 ++++++++++++++++++-------------
 arch/arm/boot/compressed/misc.c         |    3 +-
 arch/arm/boot/compressed/vmlinux.lds    |   71 +++++++++++++
 arch/arm/boot/compressed/vmlinux.lds.in |   61 -----------
 8 files changed, 195 insertions(+), 178 deletions(-)
 create mode 100644 arch/arm/boot/compressed/vmlinux.lds
 delete mode 100644 arch/arm/boot/compressed/vmlinux.lds.in

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 184a6bd..183aa0e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1259,38 +1259,39 @@ endmenu
 
 menu "Boot options"
 
-# Compressed boot loader in ROM.  Yes, we really want to ask about
-# TEXT and BSS so we preserve their values in the config files.
-config ZBOOT_ROM_TEXT
-	hex "Compressed ROM boot loader base address"
+config ZBOOT_ROM_START
+	hex "ROM/flash start address (for compressed ROM boot)"
 	default "0"
 	help
-	  The physical address at which the ROM-able zImage is to be
-	  placed in the target.  Platforms which normally make use of
-	  ROM-able zImage formats normally set this to a suitable
+	  The physical address of the ROM in the target. This setting is
+	  used by the compressed loader to determine whether it is running
+	  from ROM or RAM. This does not have to be the address of the zImage in
+	  ROM; it can be the start of the ROM chip. Platforms which normally
+	  make use of ROM-able zImage formats normally set this to a suitable
 	  value in their defconfig file.
 
 	  If ZBOOT_ROM is not enabled, this has no effect.
 
-config ZBOOT_ROM_BSS
-	hex "Compressed ROM boot loader BSS address"
+config ZBOOT_ROM_SIZE
+	hex "ROM/flash size (for compressed ROM boot)"
 	default "0"
 	help
-	  The base address of an area of read/write memory in the target
-	  for the ROM-able zImage which must be available while the
-	  decompressor is running. It must be large enough to hold the
-	  entire decompressed kernel plus an additional 128 KiB.
-	  Platforms which normally make use of ROM-able zImage formats
-	  normally set this to a suitable value in their defconfig file.
+	  The size of the ROM chip in your target. You may place the zImage in
+	  ROM anywhere within the span defined by ZBOOT_ROM_START and
+	  ZBOOT_ROM_SIZE. Platforms which normally make use of ROM-able zImage
+	  formats normally set this to a suitable value in their defconfig file.
 
 	  If ZBOOT_ROM is not enabled, this has no effect.
 
 config ZBOOT_ROM
 	bool "Compressed boot loader in ROM/flash"
-	depends on ZBOOT_ROM_TEXT != ZBOOT_ROM_BSS
+	depends on ZBOOT_ROM_SIZE != 0
 	help
 	  Say Y here if you intend to execute your compressed kernel image
-	  (zImage) directly from ROM or flash.  If unsure, say N.
+	  (zImage) directly from ROM or flash. You need to specify the ROM start
+	  address and size. The resulting zImage will still boot from RAM.
+
+	  If unsure, say N.
 
 config CMDLINE
 	string "Default kernel command string"
diff --git a/arch/arm/boot/Makefile b/arch/arm/boot/Makefile
index 4a590f4..f2a598a 100644
--- a/arch/arm/boot/Makefile
+++ b/arch/arm/boot/Makefile
@@ -65,7 +65,7 @@ quiet_cmd_uimage = UIMAGE  $@
 		   -n 'Linux-$(KERNELRELEASE)' -d $< $@
 
 ifeq ($(CONFIG_ZBOOT_ROM),y)
-$(obj)/uImage: LOADADDR=$(CONFIG_ZBOOT_ROM_TEXT)
+$(obj)/uImage: LOADADDR=$(CONFIG_ZBOOT_ROM_START)
 else
 $(obj)/uImage: LOADADDR=$(ZRELADDR)
 endif
diff --git a/arch/arm/boot/compressed/.gitignore b/arch/arm/boot/compressed/.gitignore
index ab204db..3bc9fc2 100644
--- a/arch/arm/boot/compressed/.gitignore
+++ b/arch/arm/boot/compressed/.gitignore
@@ -1,3 +1,2 @@
 font.c
 piggy.gz
-vmlinux.lds
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 2d4d88b..809ddba 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -48,25 +48,12 @@ else
 endif
 endif
 
-#
-# We now have a PIC decompressor implementation.  Decompressors running
-# from RAM should not define ZTEXTADDR.  Decompressors running directly
-# from ROM or Flash must define ZTEXTADDR (preferably via the config)
-# FIXME: Previous assignment to ztextaddr-y is lost here. See SHARK
-ifeq ($(CONFIG_ZBOOT_ROM),y)
-ZTEXTADDR	:= $(CONFIG_ZBOOT_ROM_TEXT)
-ZBSSADDR	:= $(CONFIG_ZBOOT_ROM_BSS)
-else
-ZTEXTADDR	:= 0
-ZBSSADDR	:= ALIGN(4)
-endif
-
-SEDFLAGS	= s/TEXT_START/$(ZTEXTADDR)/;s/BSS_START/$(ZBSSADDR)/
+EXTRA_CFLAGS	:= -fpic -fno-builtin
 
 suffix_$(CONFIG_KERNEL_GZIP) = gzip
 suffix_$(CONFIG_KERNEL_LZO)  = lzo
 
-targets       := vmlinux vmlinux.lds \
+targets       := vmlinux \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
 		 font.o font.c head.o misc.o $(OBJS)
 
@@ -75,7 +62,6 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
 KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
 endif
 
-EXTRA_CFLAGS  := -fpic -fno-builtin
 EXTRA_AFLAGS  := -Wa,-march=all
 
 # Supply ZRELADDR, INITRD_PHYS and PARAMS_PHYS to the decompressor via
@@ -106,10 +92,6 @@ lib1funcs = $(obj)/lib1funcs.o
 $(obj)/lib1funcs.S: $(srctree)/arch/$(SRCARCH)/lib/lib1funcs.S FORCE
 	$(call cmd,shipped)
 
-# Don't allow any static data in misc.o, which
-# would otherwise mess up our GOT table
-CFLAGS_misc.o := -Dstatic=
-
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
 	 	$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) FORCE
 	$(call if_changed,ld)
@@ -120,10 +102,5 @@ $(obj)/piggy.$(suffix_y): $(obj)/../Image FORCE
 
 $(obj)/piggy.$(suffix_y).o:  $(obj)/piggy.$(suffix_y) FORCE
 
-CFLAGS_font.o := -Dstatic=
-
 $(obj)/font.c: $(FONTC)
 	$(call cmd,shipped)
-
-$(obj)/vmlinux.lds: $(obj)/vmlinux.lds.in arch/arm/boot/Makefile .config
-	@sed "$(SEDFLAGS)" < $< > $@
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 4fddc50..174955b 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -133,8 +133,8 @@ start:
 
 		b	1f
 		.word	0x016f2818		@ Magic numbers to help the loader
-		.word	start			@ absolute load/run zImage address
-		.word	_edata			@ zImage end address
+		.word	0			@ absolute load/run zImage address
+		.word	__zimage_size		@ zImage end address
 1:		mov	r7, r1			@ save architecture ID
 		mov	r8, r2			@ save atags pointer
 
@@ -169,77 +169,121 @@ not_angel:
 		 */
 
 		.text
-		adr	r0, LC0
- ARM(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip, sp}	)
- THUMB(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip}	)
- THUMB(		ldr	sp, [r0, #28]				)
-		subs	r0, r0, r1		@ calculate the delta offset
+		/*
+		 * Set up a stack. This needs to have an offset added later.
+		 */
+		ldr	sp, =__stack_end
 
-						@ if delta is zero, we are
-		beq	not_relocated		@ running at the address we
-						@ were linked at.
+		/*
+		 * This is used throughout the following code, and also in
+		 * the cache_on code to set up the page tables.
+		 */
+		ldr	r4, =zreladdr
 
+here:
+		adr	r0, here		@ calculate the delta offset.
+		ldr	r1, =here		@ this is the load address since
+		sub	r5, r0, r1		@ we're linked at address 0.
+
+#ifdef CONFIG_ZBOOT_ROM
 		/*
-		 * We're running at a different address.  We need to fix
-		 * up various pointers:
-		 *   r5 - zImage base address
-		 *   r6 - GOT start
-		 *   ip - GOT end
+		 * Check to see if we're running from ROM by seeing if our load
+		 * address is within the ROM space
 		 */
-		add	r5, r5, r0
-		add	r6, r6, r0
-		add	ip, ip, r0
+		ldr	r0, =CONFIG_ZBOOT_ROM_START
+		ldr	r1, =CONFIG_ZBOOT_ROM_SIZE
+		cmp	r5, r0
+		blo	in_ram
+		add	r1, r0, r1
+		cmp	r5, r1
+		bhs	in_ram
 
-#ifndef CONFIG_ZBOOT_ROM
 		/*
-		 * If we're running fully PIC === CONFIG_ZBOOT_ROM = n,
-		 * we need to fix up pointers into the BSS region.
-		 *   r2 - BSS start
-		 *   r3 - BSS end
-		 *   sp - stack pointer
+		 * Relocate ourselves to the kernel load address. This is a
+		 * known good RAM address. The overlap code later will perform
+		 * the final relocation of the decompressed kernel on top of us.
 		 */
-		add	r2, r2, r0
-		add	r3, r3, r0
-		add	sp, sp, r0
+		add	sp, sp, r4
 
 		/*
-		 * Relocate all entries in the GOT table.
+		 * Turn on caches to speed up copy
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
-		add	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
-		blo	1b
-#else
+		bl	cache_on
 
 		/*
-		 * Relocate entries in the GOT table.  We only relocate
-		 * the entries that are outside the (relocated) BSS region.
+		 * Do the copy
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
-		cmp	r1, r2			@ entry < bss_start ||
-		cmphs	r3, r1			@ _end < entry
-		addlo	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
+
+		mov	r0, r5			@ from (ROM)
+		mov	r1, r4			@ to (RAM)
+		ldr	r2, =__zimage_size	@ size
+		add	r2, r0, r2		@ ROM image end
+1:		ldmia	r0!, {r3, r10, r11, r12}
+		stmia	r1!, {r3, r10, r11, r12}
+		ldmia	r0!, {r3, r10, r11, r12}
+		stmia	r1!, {r3, r10, r11, r12}
+		cmp	r0, r2
 		blo	1b
+
+		bl	cache_clean_flush
+
+		/*
+		 * Load our new base (kernel load address) and continue running
+		 * from there.
+		 */
+		mov	r5, r4
+		ldr	r0, =fixup
+		add	r0, r0, r5
+		mov	pc, r0
+
 #endif
 
-not_relocated:	mov	r0, #0
-1:		str	r0, [r2], #4		@ clear bss
-		str	r0, [r2], #4
-		str	r0, [r2], #4
-		str	r0, [r2], #4
-		cmp	r2, r3
-		blo	1b
+in_ram:
 
 		/*
-		 * The C runtime environment should now be setup
-		 * sufficiently.  Turn the cache on, set up some
-		 * pointers, and start decompressing.
+		 * If we're running from RAM then just offset the stack and
+		 * enable caches.
 		 */
+
+		add	sp, sp, r5
 		bl	cache_on
 
+fixup:
+		cmp	r5, #0			@ if delta is zero, we are
+		beq	not_relocated		@ running@the address we
+						@ were linked at.
+
+		/*
+		 * Fix up the GOT address
+		 */
+		ldr	r1, =_got_start
+		add	r1, r1, r5
+		ldr	r2, =_got_end
+		add	r2, r2, r5
+
+		/*
+		 * Relocate all entries in the GOT table.
+		 */
+1:		ldr	r0, [r1]		@ relocate entries in the GOT
+		add	r0, r0, r5		@ table.  This fixes up the
+		str	r0, [r1], #4		@ C references.
+		cmp	r1, r2
+		blo	1b
+
+not_relocated:
+		ldr	r1, =__bss_start
+		add	r1, r1, r5
+		ldr	r2, =__bss_end
+		add	r2, r2, r5
+
+		mov	r0, #0
+1:		str	r0, [r1], #4		@ clear bss
+		str	r0, [r1], #4
+		str	r0, [r1], #4
+		str	r0, [r1], #4
+		cmp	r1, r2
+		blo	1b
+
 		mov	r1, sp			@ malloc space above stack
 		add	r2, sp, #0x10000	@ 64k max
 
@@ -278,7 +322,7 @@ not_relocated:	mov	r0, #0
  */
 		add	r1, r5, r0		@ end of decompressed kernel
 		adr	r2, reloc_start
-		ldr	r3, LC1
+		ldr	r3, reloc_size
 		add	r3, r2, r3
 1:		ldmia	r2!, {r9 - r12, r14}	@ copy relocation code
 		stmia	r1!, {r9 - r12, r14}
@@ -294,6 +338,9 @@ not_relocated:	mov	r0, #0
  THUMB(		add	r12, r5, r0		)
  THUMB(		mov	pc, r12			) @ call relocation code
 
+reloc_size:
+		.word reloc_end - reloc_start
+
 /*
  * We're not in danger of overwriting ourselves.  Do this the simple way.
  *
@@ -305,26 +352,13 @@ wont_overwrite:	mov	r0, r4
 		bl	decompress_kernel
 		b	call_kernel
 
-		.align	2
-		.type	LC0, #object
-LC0:		.word	LC0			@ r1
-		.word	__bss_start		@ r2
-		.word	_end			@ r3
-		.word	zreladdr		@ r4
-		.word	_start			@ r5
-		.word	_got_start		@ r6
-		.word	_got_end		@ ip
-		.word	user_stack+4096		@ sp
-LC1:		.word	reloc_end - reloc_start
-		.size	LC0, . - LC0
-
 #ifdef CONFIG_ARCH_RPC
 		.globl	params
 params:		ldr	r0, =params_phys
 		mov	pc, lr
-		.ltorg
 		.align
 #endif
+		.ltorg
 
 /*
  * Turn on the cache.  We need to setup some page tables so that we
@@ -548,6 +582,7 @@ __common_mmu_cache_on:
  * r9-r12,r14 = corrupted
  */
 		.align	5
+
 reloc_start:	add	r9, r5, r0
 		sub	r9, r9, #128		@ do not copy the stack
 		debug_reloc_start
@@ -1076,7 +1111,3 @@ memdump:	mov	r12, r0
 
 		.ltorg
 reloc_end:
-
-		.align
-		.section ".stack", "w"
-user_stack:	.space	4096
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index 56a0d11..97018f8 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -23,7 +23,6 @@ unsigned int __machine_arch_type;
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
 #include <linux/stddef.h>	/* for NULL */
-#include <asm/string.h>
 #include <linux/linkage.h>
 
 #include <asm/unaligned.h>
@@ -166,7 +165,7 @@ void __memzero (__ptr_t s, size_t n)
 		*u.ucp++ = 0;
 }
 
-static inline __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
+static __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
 			    size_t __n)
 {
 	int i = 0;
diff --git a/arch/arm/boot/compressed/vmlinux.lds b/arch/arm/boot/compressed/vmlinux.lds
new file mode 100644
index 0000000..2ea7af5
--- /dev/null
+++ b/arch/arm/boot/compressed/vmlinux.lds
@@ -0,0 +1,71 @@
+/*
+ *  linux/arch/arm/boot/compressed/vmlinux.lds.in
+ *
+ *  Copyright (C) 2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+OUTPUT_ARCH(arm)
+ENTRY(_start)
+SECTIONS
+{
+  /DISCARD/ : {
+    *(.ARM.exidx*)
+    *(.ARM.extab*)
+  }
+
+  /* Note: should always build at 0 for relocatable stub */
+  . = 0;
+
+  .text : {
+    _start = .;
+    *(.start)
+    *(.text)
+    *(.text.*)
+    *(.fixup)
+    *(.gnu.warning)
+    *(.rodata)
+    *(.rodata.*)
+    *(.glue_7)
+    *(.glue_7t)
+    *(.piggydata)
+    . = ALIGN(4);
+  }
+
+  /* The GOT is only for PIC builds, so it should not exist in the ROM build */
+  _got_start = .;
+  .got			: { *(.got) }
+  _got_end = .;
+  .got.plt		: { *(.got.plt) }
+
+  . = ALIGN(16);
+
+  .data : {
+    *(.data)
+    . = ALIGN(16);
+  }
+
+  __zimage_size = .;
+
+  __bss_start = .;
+  .bss : { *(.bss) }
+  . = ALIGN(16);
+  __bss_end = .;
+
+  .stack : {
+    __stack_start = .;
+    . += 4096;
+    __stack_end = .;
+  }
+
+  .stab 0		: { *(.stab) }
+  .stabstr 0		: { *(.stabstr) }
+  .stab.excl 0		: { *(.stab.excl) }
+  .stab.exclstr 0	: { *(.stab.exclstr) }
+  .stab.index 0		: { *(.stab.index) }
+  .stab.indexstr 0	: { *(.stab.indexstr) }
+  .comment 0		: { *(.comment) }
+}
+
diff --git a/arch/arm/boot/compressed/vmlinux.lds.in b/arch/arm/boot/compressed/vmlinux.lds.in
deleted file mode 100644
index a5924b9..0000000
--- a/arch/arm/boot/compressed/vmlinux.lds.in
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- *  linux/arch/arm/boot/compressed/vmlinux.lds.in
- *
- *  Copyright (C) 2000 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-OUTPUT_ARCH(arm)
-ENTRY(_start)
-SECTIONS
-{
-  /DISCARD/ : {
-    *(.ARM.exidx*)
-    *(.ARM.extab*)
-  }
-
-  . = TEXT_START;
-  _text = .;
-
-  .text : {
-    _start = .;
-    *(.start)
-    *(.text)
-    *(.text.*)
-    *(.fixup)
-    *(.gnu.warning)
-    *(.rodata)
-    *(.rodata.*)
-    *(.glue_7)
-    *(.glue_7t)
-    *(.piggydata)
-    . = ALIGN(4);
-  }
-
-  _etext = .;
-
-  _got_start = .;
-  .got			: { *(.got) }
-  _got_end = .;
-  .got.plt		: { *(.got.plt) }
-  .data			: { *(.data) }
-  _edata = .;
-
-  . = BSS_START;
-  __bss_start = .;
-  .bss			: { *(.bss) }
-  _end = .;
-
-  .stack (NOLOAD)	: { *(.stack) }
-
-  .stab 0		: { *(.stab) }
-  .stabstr 0		: { *(.stabstr) }
-  .stab.excl 0		: { *(.stab.excl) }
-  .stab.exclstr 0	: { *(.stab.exclstr) }
-  .stab.index 0		: { *(.stab.index) }
-  .stab.indexstr 0	: { *(.stab.indexstr) }
-  .comment 0		: { *(.comment) }
-}
-
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24  2:23 [PATCH] Clean up ARM compressed loader Hector Martin
@ 2010-02-24  8:51 ` Uwe Kleine-König
  2010-02-24  9:28   ` Hector Martin
  2010-02-24 22:34 ` Russell King - ARM Linux
  1 sibling, 1 reply; 41+ messages in thread
From: Uwe Kleine-König @ 2010-02-24  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wed, Feb 24, 2010 at 03:23:37AM +0100, Hector Martin wrote:
> The -Dstatic= stuff in the Makefile is a hack and subtly breaks the
> current inflate implementation, because the fixed inflate tables are
> included into a function and removing static places them in the stack,
> which bloats the stack and breaks references after the function returns.
> So get rid of the hack.
> 
> Instead, clean up the stub loader and unify it. The loader is now
> exactly the same regardless of whether you've enabled CONFIG_ZBOOT_ROM
> or not, and runs from RAM in PIC mode. CONFIG_ZBOOT_ROM simply enables
> an extra section of stub code that lets the zImage boot from ROM by
> copying itself to RAM first.
This is a big regression as this introduces two new copies.  And as
CONFIG_ZBOOT_ROM isn't primary about saving RAM at the early boot stages
but for fast booting this is not acceptable.
 
Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24  8:51 ` Uwe Kleine-König
@ 2010-02-24  9:28   ` Hector Martin
  0 siblings, 0 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-24  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

Uwe Kleine-K?nig wrote:
> This is a big regression as this introduces two new copies.  And as
> CONFIG_ZBOOT_ROM isn't primary about saving RAM at the early boot stages
> but for fast booting this is not acceptable.

I'm not seeing any significant performance decrease due to these
optimized copies, compared to the time spent actually uncompressing the
kernel. In fact, it's quite possible that this will speed things up on
some implementations (particularly simpler/older/slower ones), as doing
a block-based copy from ROM to RAM followed by a RAM->RAM inflate can
very well be faster than uncompressing straight from ROM, if the ROM bus
is slower. Inflate causes many small accesses.

If this can be shown to be a real performance issue, it can be fixed by:
1) not copying the actual payload to RAM, instead passing the ROM
address to the decompressor (you still need a gaping hole in RAM since
you can't relocate the other sections up to fill the hole, but you avoid
the copy)
2) keeping track of the true uncompressed kernel size (not just some 4x
expansion guess), so the loader can be placed directly afterwards,
leaving space for the kernel and avoiding the final relocation.

I'll be happy to make these changes if it really makes a difference, but
I'm not convinced right now.

It's worth noting that while debugging this I found a bug in the inflate
stuff that causes inflate to be *much* slower for certain targets,
including some ARM ones (yet nobody appears to have complained). I'll be
submitting a fix for this to LKML, as it affects the generic deflate
implementation.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24  2:23 [PATCH] Clean up ARM compressed loader Hector Martin
  2010-02-24  8:51 ` Uwe Kleine-König
@ 2010-02-24 22:34 ` Russell King - ARM Linux
  2010-02-24 23:34   ` Nicolas Pitre
  1 sibling, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-24 22:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 03:23:37AM +0100, Hector Martin wrote:
> Instead, clean up the stub loader and unify it. The loader is now
> exactly the same regardless of whether you've enabled CONFIG_ZBOOT_ROM
> or not, and runs from RAM in PIC mode. CONFIG_ZBOOT_ROM simply enables
> an extra section of stub code that lets the zImage boot from ROM by
> copying itself to RAM first.

Unacceptable.

The point of putting the decompressor in ROM is to save the time spent
copying it to RAM first.  Boot loaders are already perfectly capable
of copying the image into RAM and calling it there.

Getting the decompressor to do that step first just moves the copy from
the boot loader into the decompressor.  It doesn't eliminate it, which
is what ROM-based decompression is all about.

Basically, this patch makes the feature entirely pointless.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 22:34 ` Russell King - ARM Linux
@ 2010-02-24 23:34   ` Nicolas Pitre
  2010-02-24 23:42     ` Russell King - ARM Linux
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-24 23:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Feb 2010, Russell King - ARM Linux wrote:

> On Wed, Feb 24, 2010 at 03:23:37AM +0100, Hector Martin wrote:
> > Instead, clean up the stub loader and unify it. The loader is now
> > exactly the same regardless of whether you've enabled CONFIG_ZBOOT_ROM
> > or not, and runs from RAM in PIC mode. CONFIG_ZBOOT_ROM simply enables
> > an extra section of stub code that lets the zImage boot from ROM by
> > copying itself to RAM first.
> 
> Unacceptable.
> 
> The point of putting the decompressor in ROM is to save the time spent
> copying it to RAM first.  Boot loaders are already perfectly capable
> of copying the image into RAM and calling it there.
> 
> Getting the decompressor to do that step first just moves the copy from
> the boot loader into the decompressor.  It doesn't eliminate it, which
> is what ROM-based decompression is all about.
> 
> Basically, this patch makes the feature entirely pointless.

What about simply not compiling the decompressor with -fPIC when using 
ZBOOT_ROM=y?  That would certainly solve the problem with the only 
restriction that such kernel images won't be bootable from RAM which is 
probably an acceptable compromize.

The idea with ZBOOT_ROM is also to have an empty .data section so there 
is no data segment to copy. This is achieved by not having any writable 
global variable nor static local variables (unless they're marked const 
of course).  We probably should add an assert statement in the linker 
script to ensure this is the case.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 23:34   ` Nicolas Pitre
@ 2010-02-24 23:42     ` Russell King - ARM Linux
  2010-02-24 23:57       ` Hector Martin
  0 siblings, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-24 23:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
> What about simply not compiling the decompressor with -fPIC when using 
> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
> restriction that such kernel images won't be bootable from RAM which is 
> probably an acceptable compromize.

Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.

I'm afraid that the commit adding LZO support is going to have to be
reverted until a better solution to this is found.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 23:42     ` Russell King - ARM Linux
@ 2010-02-24 23:57       ` Hector Martin
  2010-02-25  0:01         ` Russell King - ARM Linux
  2010-02-25  4:06         ` Nicolas Pitre
  0 siblings, 2 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-24 23:57 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
>> What about simply not compiling the decompressor with -fPIC when using 
>> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
>> restriction that such kernel images won't be bootable from RAM which is 
>> probably an acceptable compromize.
> 
> Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.

Yes it does, that's exactly what my first version of the patch did. Once
you get rid of the partial relocation used for ROM builds (with split
text/bss) you don't need -Dstatic=, and once you get rid of that you
solve the stack-bashing. The ROM build becomes a bog standard
non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
copy initialized data to RAM), and the RAM build becomes a bog standard
relocatable image (a single contiguous blob including
text/rodata/data/bss) that doesn't suffer from any issues when you move
it around.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 23:57       ` Hector Martin
@ 2010-02-25  0:01         ` Russell King - ARM Linux
  2010-02-25  0:30           ` Hector Martin
  2010-02-25  4:06         ` Nicolas Pitre
  1 sibling, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25  0:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 12:57:20AM +0100, Hector Martin wrote:
> Russell King - ARM Linux wrote:
> > On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
> >> What about simply not compiling the decompressor with -fPIC when using 
> >> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
> >> restriction that such kernel images won't be bootable from RAM which is 
> >> probably an acceptable compromize.
> > 
> > Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
> 
> Yes it does, that's exactly what my first version of the patch did. Once
> you get rid of the partial relocation used for ROM builds (with split
> text/bss) you don't need -Dstatic=, and once you get rid of that you
> solve the stack-bashing. The ROM build becomes a bog standard
> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
> copy initialized data to RAM), and the RAM build becomes a bog standard
> relocatable image (a single contiguous blob including
> text/rodata/data/bss) that doesn't suffer from any issues when you move
> it around.

Did you bother to read my previous reply explaining that this is not
a hack for the toolchain?  It sounds to me like you didn't.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  0:01         ` Russell King - ARM Linux
@ 2010-02-25  0:30           ` Hector Martin
  2010-02-25  4:28             ` Nicolas Pitre
  2010-02-25  8:23             ` Russell King - ARM Linux
  0 siblings, 2 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-25  0:30 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Thu, Feb 25, 2010 at 12:57:20AM +0100, Hector Martin wrote:
>> Russell King - ARM Linux wrote:
>>> On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
>>>> What about simply not compiling the decompressor with -fPIC when using 
>>>> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
>>>> restriction that such kernel images won't be bootable from RAM which is 
>>>> probably an acceptable compromize.
>>> Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
>> Yes it does, that's exactly what my first version of the patch did. Once
>> you get rid of the partial relocation used for ROM builds (with split
>> text/bss) you don't need -Dstatic=, and once you get rid of that you
>> solve the stack-bashing. The ROM build becomes a bog standard
>> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
>> copy initialized data to RAM), and the RAM build becomes a bog standard
>> relocatable image (a single contiguous blob including
>> text/rodata/data/bss) that doesn't suffer from any issues when you move
>> it around.
> 
> Did you bother to read my previous reply explaining that this is not
> a hack for the toolchain?  It sounds to me like you didn't.

Are you new to C? It looks like you are seriously still claiming that
-Dstatic= is A-OK under all circumstances. It very obviously isn't.

In your other e-mail, you're missing the point. You're arguing that
compiler behavior regarding access to data (particularly that with
static vs. global visibility) is well-defined enough that you can
guarantee that GOT-relative accesses will not happen as long as you
avoid static data. I'm still not convinced, but that's not my main point.

The problem is that -Dstatic= isn't a magical safe way of making objects
globally visible, because the 'static' keyword in C means different
things in different places. Specifically, it means a totally different
thing inside functions. Instead of turning a static file-global array
into a globally visible array, you're turning a static (as in
non-stack-allocated) array inside a function into a stack-allocated
array. And as no doubt you're aware, stack objects become undefined once
the function exits, and references to them are invalid. Welcome to our
inflate heisenbug:

static void zlib_fixedtables(struct inflate_state *state)

{

#   include "inffixed.h"  /* static blah {len,dist}fix[] = {...}; */
    state->lencode = lenfix;

    state->lenbits = 9;

    state->distcode = distfix;

    state->distbits = 5;

}

Guess what happens when you remove the static keywords there.

Long story short, -Dstatic= isn't accomplishing what you want, because
it's crude and doesn't handle all cases properly. If you want to keep
the current model, you have to be willing to manually watch changes to
the decompressor implementations so they do not violate your assumptions
(i.e. never use static inside functions and rely on it, at least),
because -Dstatic= will break valid C code. Under _all_ circumstances, no
matter what your other compiler flags are or how you link and load
stuff, -Dstatic= is guaranteed to break some valid C code, and that's
what's happening here.

Sure, you could patch up the issue by moving the #include out of the
function above, but then you'd better be willing to watch for anything
like that anywhere in decompressor code used for ARM, because other
kernel developers aren't going to know that ARM silently discards their
static keywords. I'm not even sure this is the only broken code so far,
there could be more.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 23:57       ` Hector Martin
  2010-02-25  0:01         ` Russell King - ARM Linux
@ 2010-02-25  4:06         ` Nicolas Pitre
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25  4:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> Russell King - ARM Linux wrote:
> > On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
> >> What about simply not compiling the decompressor with -fPIC when using 
> >> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
> >> restriction that such kernel images won't be bootable from RAM which is 
> >> probably an acceptable compromize.
> > 
> > Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
> 
> Yes it does, that's exactly what my first version of the patch did. Once
> you get rid of the partial relocation used for ROM builds (with split
> text/bss) you don't need -Dstatic=, and once you get rid of that you
> solve the stack-bashing. The ROM build becomes a bog standard
> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
> copy initialized data to RAM), and the RAM build becomes a bog standard
> relocatable image (a single contiguous blob including
> text/rodata/data/bss) that doesn't suffer from any issues when you move
> it around.

OK...  Well, just to put things in perspective: the CONFIG_ZBOOT_ROM 
was invented by myself almost 12 years ago.  No split LMA/VMA was needed 
because the code didn't use any writable global variable.  The idea is 
to have the boot loader branch directly to the decompressor stored in 
flash (those were the days when NOR flash was more popular than NAND) 
and then the decompressor simply get the deflated data directly from ROM 
and stores the inflated kernel directly in RAM at its final address.  No 
copy what so ever was involved, and with the CPU I and D caches enabled 
and burst mode properly configured in the flash you get the fastest boot 
possible.  At least until I added support to do eXecute-In-Place from 
flash with the kernel that is (yes, I'm guilty for that as well).  So 
that has to continue working otherwise it is a regression.

Then, Russell added the ability for the decompressor to be PIC.  And he 
even managed for a ZBOOT_ROM kernel to just work if you decided to load 
it in RAM and execute it from there.  But for that to work the -Dstatic 
hack was needed.  Admitedly, this hack and the ability to run a 
ZBOOT_ROM kernel from anywhere in RAM is nice but not strictly needed.  
But no one could argue that this didn't work.  Claiming that this is an 
abuse of the compiler is beside the point: it has worked for years and 
now _you_ are the one who wants to change the rules, not the GCC people.

Now I apparently missed something in the thread... What is this stack 
bashing about?  Admitedly I didn't look at any of the patches (yet).


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  0:30           ` Hector Martin
@ 2010-02-25  4:28             ` Nicolas Pitre
  2010-02-25  4:33               ` Nicolas Pitre
                                 ` (2 more replies)
  2010-02-25  8:23             ` Russell King - ARM Linux
  1 sibling, 3 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25  4:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> Russell King - ARM Linux wrote:
> > On Thu, Feb 25, 2010 at 12:57:20AM +0100, Hector Martin wrote:
> >> Russell King - ARM Linux wrote:
> >>> On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
> >>>> What about simply not compiling the decompressor with -fPIC when using 
> >>>> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
> >>>> restriction that such kernel images won't be bootable from RAM which is 
> >>>> probably an acceptable compromize.
> >>> Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
> >> Yes it does, that's exactly what my first version of the patch did. Once
> >> you get rid of the partial relocation used for ROM builds (with split
> >> text/bss) you don't need -Dstatic=, and once you get rid of that you
> >> solve the stack-bashing. The ROM build becomes a bog standard
> >> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
> >> copy initialized data to RAM), and the RAM build becomes a bog standard
> >> relocatable image (a single contiguous blob including
> >> text/rodata/data/bss) that doesn't suffer from any issues when you move
> >> it around.
> > 
> > Did you bother to read my previous reply explaining that this is not
> > a hack for the toolchain?  It sounds to me like you didn't.
> 
> Are you new to C? It looks like you are seriously still claiming that
> -Dstatic= is A-OK under all circumstances. It very obviously isn't.

I didn't see Russell make any such claim anywhere.  We're talking about 
a very particular case here.

> In your other e-mail, you're missing the point. You're arguing that
> compiler behavior regarding access to data (particularly that with
> static vs. global visibility) is well-defined enough that you can
> guarantee that GOT-relative accesses will not happen as long as you
> avoid static data. I'm still not convinced, but that's not my main point.
> 
> The problem is that -Dstatic= isn't a magical safe way of making objects
> globally visible, because the 'static' keyword in C means different
> things in different places. Specifically, it means a totally different
> thing inside functions.  [...]

Yes we all know that.

> static void zlib_fixedtables(struct inflate_state *state)
> 
> {
> 
> #   include "inffixed.h"  /* static blah {len,dist}fix[] = {...}; */
>     state->lencode = lenfix;
> 
>     state->lenbits = 9;
> 
>     state->distcode = distfix;
> 
>     state->distbits = 5;
> 
> }
> 
> Guess what happens when you remove the static keywords there.
> 
> Long story short, -Dstatic= isn't accomplishing what you want, because
> it's crude and doesn't handle all cases properly. If you want to keep
> the current model, you have to be willing to manually watch changes to
> the decompressor implementations so they do not violate your assumptions
> (i.e. never use static inside functions and rely on it, at least),
> because -Dstatic= will break valid C code. Under _all_ circumstances, no
> matter what your other compiler flags are or how you link and load
> stuff, -Dstatic= is guaranteed to break some valid C code, and that's
> what's happening here.

Obviously.  But that's the core of the argument: we made sure that the 
previous implementation didn't use any static within functions.  Why do 
you need to do that?  This is usually a bad idea anyway as this makes 
the function non reentrant.

> Sure, you could patch up the issue by moving the #include out of the
> function above, but then you'd better be willing to watch for anything
> like that anywhere in decompressor code used for ARM, because other
> kernel developers aren't going to know that ARM silently discards their
> static keywords. I'm not even sure this is the only broken code so far,
> there could be more.

Two possibilities that I can see:

Put this in a common header file for the decompressor code:

/* ARM wants to redefine this sometimes */
#ifndef static_func
#define static_func
#endif

Then declare your functions with static_func, and ARM can use 
-Dstatic_func.

Or, we give up the ability to load in RAM a ROMable zImage.

Personally I'd favor first solution.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  4:28             ` Nicolas Pitre
@ 2010-02-25  4:33               ` Nicolas Pitre
  2010-02-25  9:38               ` Russell King - ARM Linux
  2010-02-25  9:51               ` Hector Martin
  2 siblings, 0 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25  4:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Feb 2010, Nicolas Pitre wrote:

> Put this in a common header file for the decompressor code:
> 
> /* ARM wants to redefine this sometimes */
> #ifndef static_func
> #define static_func
> #endif

Obviously I meant

#define static_func static


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  0:30           ` Hector Martin
  2010-02-25  4:28             ` Nicolas Pitre
@ 2010-02-25  8:23             ` Russell King - ARM Linux
  1 sibling, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25  8:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 01:30:28AM +0100, Hector Martin wrote:
> Russell King - ARM Linux wrote:
> > On Thu, Feb 25, 2010 at 12:57:20AM +0100, Hector Martin wrote:
> >> Russell King - ARM Linux wrote:
> >>> On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
> >>>> What about simply not compiling the decompressor with -fPIC when using 
> >>>> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
> >>>> restriction that such kernel images won't be bootable from RAM which is 
> >>>> probably an acceptable compromize.
> >>> Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
> >> Yes it does, that's exactly what my first version of the patch did. Once
> >> you get rid of the partial relocation used for ROM builds (with split
> >> text/bss) you don't need -Dstatic=, and once you get rid of that you
> >> solve the stack-bashing. The ROM build becomes a bog standard
> >> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
> >> copy initialized data to RAM), and the RAM build becomes a bog standard
> >> relocatable image (a single contiguous blob including
> >> text/rodata/data/bss) that doesn't suffer from any issues when you move
> >> it around.
> > 
> > Did you bother to read my previous reply explaining that this is not
> > a hack for the toolchain?  It sounds to me like you didn't.
> 
> Are you new to C? It looks like you are seriously still claiming that
> -Dstatic= is A-OK under all circumstances. It very obviously isn't.

Clearly you didn't (a) read my mail and (b) understand it.

And if you're going to be that insulting, I see no point discussing
this further with you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  4:28             ` Nicolas Pitre
  2010-02-25  4:33               ` Nicolas Pitre
@ 2010-02-25  9:38               ` Russell King - ARM Linux
  2010-02-25 10:05                 ` Hector Martin
  2010-02-25 12:24                 ` Russell King - ARM Linux
  2010-02-25  9:51               ` Hector Martin
  2 siblings, 2 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25  9:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 11:28:45PM -0500, Nicolas Pitre wrote:
> Obviously.  But that's the core of the argument: we made sure that the 
> previous implementation didn't use any static within functions.  Why do 
> you need to do that?  This is usually a bad idea anyway as this makes 
> the function non reentrant.

Indeed - but having static data within a function doesn't make it non-
reentrant - having static data modified by the function can make it
non-reentrant.

I wonder if we can get around this problem a slightly different way -
instead of building the decompressor into misc.c, build it as a separate
PIC object without the -Dstatic= stuff, and verify that it contains no
read-write data.

Since these decompressors are designed to be used within the multi-threaded
environment of the kernel, they shouldn't have any read-write data within
them (or if they do, they'd better use some locking - which would make
them unusable for decompressors.)

Then we can tailor our misc.c not to use static data (which is the real
problem - not the functions being static), and the -Dstatic= hack will
go away.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  4:28             ` Nicolas Pitre
  2010-02-25  4:33               ` Nicolas Pitre
  2010-02-25  9:38               ` Russell King - ARM Linux
@ 2010-02-25  9:51               ` Hector Martin
  2010-02-25 18:30                 ` Nicolas Pitre
  2 siblings, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-25  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
> On Thu, 25 Feb 2010, Hector Martin wrote:
> 
>> Russell King - ARM Linux wrote:
>>> On Thu, Feb 25, 2010 at 12:57:20AM +0100, Hector Martin wrote:
>>>> Russell King - ARM Linux wrote:
>>>>> On Wed, Feb 24, 2010 at 06:34:49PM -0500, Nicolas Pitre wrote:
>>>>>> What about simply not compiling the decompressor with -fPIC when using 
>>>>>> ZBOOT_ROM=y?  That would certainly solve the problem with the only 
>>>>>> restriction that such kernel images won't be bootable from RAM which is 
>>>>>> probably an acceptable compromize.
>>>>> Unfortunately, that doesn't solve the stack-bashing with ZBOOT_ROM=n.
>>>> Yes it does, that's exactly what my first version of the patch did. Once
>>>> you get rid of the partial relocation used for ROM builds (with split
>>>> text/bss) you don't need -Dstatic=, and once you get rid of that you
>>>> solve the stack-bashing. The ROM build becomes a bog standard
>>>> non-relocatable ROM image (with the usual LMA/VMA linker script stuff to
>>>> copy initialized data to RAM), and the RAM build becomes a bog standard
>>>> relocatable image (a single contiguous blob including
>>>> text/rodata/data/bss) that doesn't suffer from any issues when you move
>>>> it around.
>>> Did you bother to read my previous reply explaining that this is not
>>> a hack for the toolchain?  It sounds to me like you didn't.
>> Long story short, -Dstatic= isn't accomplishing what you want, because
>> it's crude and doesn't handle all cases properly. If you want to keep
>> the current model, you have to be willing to manually watch changes to
>> the decompressor implementations so they do not violate your assumptions
>> (i.e. never use static inside functions and rely on it, at least),
>> because -Dstatic= will break valid C code. Under _all_ circumstances, no
>> matter what your other compiler flags are or how you link and load
>> stuff, -Dstatic= is guaranteed to break some valid C code, and that's
>> what's happening here.
> 
> Obviously.  But that's the core of the argument: we made sure that the 
> previous implementation didn't use any static within functions.  Why do 
> you need to do that?  This is usually a bad idea anyway as this makes 
> the function non reentrant.

Static inside functions doesn't imply being non-reentrant (any more than
using global variables does). In this case the data arrays are also
const so it doesn't matter. But even if they weren't it wouldn't imply
non-reentrancy. For example, you could have the function generate the
tables on the fly on the first call, and add a spinlock if you want it
to be reentrant (in fact, I'm pretty sure the tables could optionally be
generated on the fly in the original zlib version, though that part was
removed for the kernel).

> 
> Two possibilities that I can see:
> 
> Put this in a common header file for the decompressor code:
> 
> /* ARM wants to redefine this sometimes */
> #ifndef static_func
> #define static_func
> #endif
> 
> Then declare your functions with static_func, and ARM can use 
> -Dstatic_func.
> 
> Or, we give up the ability to load in RAM a ROMable zImage.

I can also fix the second version of the patch to get rid of the two
extra copies of the kernel.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  9:38               ` Russell King - ARM Linux
@ 2010-02-25 10:05                 ` Hector Martin
  2010-02-25 18:35                   ` Nicolas Pitre
  2010-02-25 12:24                 ` Russell King - ARM Linux
  1 sibling, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-25 10:05 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> I wonder if we can get around this problem a slightly different way -
> instead of building the decompressor into misc.c, build it as a separate
> PIC object without the -Dstatic= stuff, and verify that it contains no
> read-write data.

You're still going to have issues if it uses read-only data and accesses
it via an offset from the GOT, though.

> Then we can tailor our misc.c not to use static data (which is the real
> problem - not the functions being static), and the -Dstatic= hack will
> go away.

If you can programmatically ensure that the decompressors do not use
anything but GOT entries to access data, that's certainly a better
solution than what we have now (I don't mean actually "fixing" them as
in -Dstatic=, just something that makes the build fail if this is not
the case so someone else can go and make the required changes). Maybe
building the object and then grepping the relocations for offenders, or
something like that.

However, considering my second patch can be adapted to avoid the two
extra copies, I still think that's a cleaner solution. All I have to do
is move things around in the linker script so the piggy is at the end of
the zImage, avoid copying that the first time around (and instead pass
the old pointer to ROM to the decompressor), and introduce the real
decompressed uncompressed kernel size to the mix so I can relocate the
decompressor to ZRELADDR+kernel_size (and fix the relocation check code
so it uses the real size, not some 4x guess), which would avoid the
second copy.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  9:38               ` Russell King - ARM Linux
  2010-02-25 10:05                 ` Hector Martin
@ 2010-02-25 12:24                 ` Russell King - ARM Linux
  2010-02-25 19:24                   ` Nicolas Pitre
  1 sibling, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 09:38:59AM +0000, Russell King - ARM Linux wrote:
> Then we can tailor our misc.c not to use static data (which is the real
> problem - not the functions being static), and the -Dstatic= hack will
> go away.

And here's the patch - not run tested yet, but compile-tested for
gzip and LZO in both ROM=y and ROM=n modes.

    ARM: Eliminate decompressor -Dstatic= PIC hack

    We used to build decompressors with -Dstatic= to avoid any local data
    being generated.  The problem is that local data generates GOTOFF
    relocations, which means we can't relocate the data relative to the
    text segment.

    Global data, on the other hand, goes through the GOT, and can be
    relocated anywhere.

    Unfortunately, with the new decompressors, this presents a problem
    since they declare static data within functions, and this leads to
    stack overflow.

    Fix this by separating out the decompressor code into a separate file,
    and removing 'static' from BSS data in misc.c.

    Also, discard the .data section - this means that should we end up
    with read/write initialized data, the decompressor will fail to link
    and the problem will be obvious.

    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

 arch/arm/boot/compressed/Makefile       |    6 +--
 arch/arm/boot/compressed/decompress.c   |   45 +++++++++++++
 arch/arm/boot/compressed/misc.c         |  109 +++---------------------------
 arch/arm/boot/compressed/vmlinux.lds.in |    2 +-
 4 files changed, 58 insertions(+), 104 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 2d4d88b..97c89e7 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -5,7 +5,7 @@
 #
 
 HEAD	= head.o
-OBJS	= misc.o
+OBJS	= misc.o decompress.o
 FONTC	= $(srctree)/drivers/video/console/font_acorn_8x8.c
 
 #
@@ -106,10 +106,6 @@ lib1funcs = $(obj)/lib1funcs.o
 $(obj)/lib1funcs.S: $(srctree)/arch/$(SRCARCH)/lib/lib1funcs.S FORCE
 	$(call cmd,shipped)
 
-# Don't allow any static data in misc.o, which
-# would otherwise mess up our GOT table
-CFLAGS_misc.o := -Dstatic=
-
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
 	 	$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) FORCE
 	$(call if_changed,ld)
diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
new file mode 100644
index 0000000..0da382f
--- /dev/null
+++ b/arch/arm/boot/compressed/decompress.c
@@ -0,0 +1,45 @@
+#define _LINUX_STRING_H_
+
+#include <linux/compiler.h>	/* for inline */
+#include <linux/types.h>	/* for size_t */
+#include <linux/stddef.h>	/* for NULL */
+#include <linux/linkage.h>
+#include <asm/string.h>
+
+extern unsigned long free_mem_ptr;
+extern unsigned long free_mem_end_ptr;
+extern void error(char *);
+
+#define STATIC static
+
+#define ARCH_HAS_DECOMP_WDOG
+
+/* Diagnostic functions */
+#ifdef DEBUG
+#  define Assert(cond,msg) {if(!(cond)) error(msg);}
+#  define Trace(x) fprintf x
+#  define Tracev(x) {if (verbose) fprintf x ;}
+#  define Tracevv(x) {if (verbose>1) fprintf x ;}
+#  define Tracec(c,x) {if (verbose && (c)) fprintf x ;}
+#  define Tracecv(c,x) {if (verbose>1 && (c)) fprintf x ;}
+#else
+#  define Assert(cond,msg)
+#  define Trace(x)
+#  define Tracev(x)
+#  define Tracevv(x)
+#  define Tracec(c,x)
+#  define Tracecv(c,x)
+#endif
+
+#ifdef CONFIG_KERNEL_GZIP
+#include "../../../../lib/decompress_inflate.c"
+#endif
+
+#ifdef CONFIG_KERNEL_LZO
+#include "../../../../lib/decompress_unlzo.c"
+#endif
+
+void do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x))
+{
+	decompress(input, len, NULL, NULL, output, NULL, error);
+}
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index 7e0fe4d..4c663fb 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -23,8 +23,8 @@ unsigned int __machine_arch_type;
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
 #include <linux/stddef.h>	/* for NULL */
-#include <asm/string.h>
 #include <linux/linkage.h>
+#include <asm/string.h>
 
 #include <asm/unaligned.h>
 
@@ -106,57 +106,7 @@ static void putstr(const char *ptr)
 
 #endif
 
-#define __ptr_t void *
-
-#define memzero(s,n) __memzero(s,n)
-
-/*
- * Optimised C version of memzero for the ARM.
- */
-void __memzero (__ptr_t s, size_t n)
-{
-	union { void *vp; unsigned long *ulp; unsigned char *ucp; } u;
-	int i;
-
-	u.vp = s;
-
-	for (i = n >> 5; i > 0; i--) {
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-	}
-
-	if (n & 1 << 4) {
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-	}
-
-	if (n & 1 << 3) {
-		*u.ulp++ = 0;
-		*u.ulp++ = 0;
-	}
-
-	if (n & 1 << 2)
-		*u.ulp++ = 0;
-
-	if (n & 1 << 1) {
-		*u.ucp++ = 0;
-		*u.ucp++ = 0;
-	}
-
-	if (n & 1)
-		*u.ucp++ = 0;
-}
-
-static inline __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
-			    size_t __n)
+void *memcpy(void *__dest, __const void *__src, size_t __n)
 {
 	int i = 0;
 	unsigned char *d = (unsigned char *)__dest, *s = (unsigned char *)__src;
@@ -193,59 +143,20 @@ static inline __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
 /*
  * gzip delarations
  */
-#define STATIC static
-
-/* Diagnostic functions */
-#ifdef DEBUG
-#  define Assert(cond,msg) {if(!(cond)) error(msg);}
-#  define Trace(x) fprintf x
-#  define Tracev(x) {if (verbose) fprintf x ;}
-#  define Tracevv(x) {if (verbose>1) fprintf x ;}
-#  define Tracec(c,x) {if (verbose && (c)) fprintf x ;}
-#  define Tracecv(c,x) {if (verbose>1 && (c)) fprintf x ;}
-#else
-#  define Assert(cond,msg)
-#  define Trace(x)
-#  define Tracev(x)
-#  define Tracevv(x)
-#  define Tracec(c,x)
-#  define Tracecv(c,x)
-#endif
-
-static void error(char *m);
-
 extern char input_data[];
 extern char input_data_end[];
 
-static unsigned char *output_data;
-static unsigned long output_ptr;
-
-static void error(char *m);
+unsigned char *output_data;
+unsigned long output_ptr;
 
-static void putstr(const char *);
-
-static unsigned long free_mem_ptr;
-static unsigned long free_mem_end_ptr;
-
-#ifdef STANDALONE_DEBUG
-#define NO_INFLATE_MALLOC
-#endif
-
-#define ARCH_HAS_DECOMP_WDOG
-
-#ifdef CONFIG_KERNEL_GZIP
-#include "../../../../lib/decompress_inflate.c"
-#endif
-
-#ifdef CONFIG_KERNEL_LZO
-#include "../../../../lib/decompress_unlzo.c"
-#endif
+unsigned long free_mem_ptr;
+unsigned long free_mem_end_ptr;
 
 #ifndef arch_error
 #define arch_error(x)
 #endif
 
-static void error(char *x)
+void error(char *x)
 {
 	arch_error(x);
 
@@ -261,6 +172,8 @@ asmlinkage void __div0(void)
 	error("Attempting division by 0!");
 }
 
+extern void do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x));
+
 #ifndef STANDALONE_DEBUG
 
 unsigned long
@@ -281,8 +194,8 @@ decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p,
 	output_ptr = get_unaligned_le32(tmp);
 
 	putstr("Uncompressing Linux...");
-	decompress(input_data, input_data_end - input_data,
-			NULL, NULL, output_data, NULL, error);
+	do_decompress(input_data, input_data_end - input_data,
+			output_data, error);
 	putstr(" done, booting the kernel.\n");
 	return output_ptr;
 }
diff --git a/arch/arm/boot/compressed/vmlinux.lds.in b/arch/arm/boot/compressed/vmlinux.lds.in
index a5924b9..d00b544 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.in
+++ b/arch/arm/boot/compressed/vmlinux.lds.in
@@ -14,6 +14,7 @@ SECTIONS
   /DISCARD/ : {
     *(.ARM.exidx*)
     *(.ARM.extab*)
+    *(.data)
   }
 
   . = TEXT_START;
@@ -40,7 +41,6 @@ SECTIONS
   .got			: { *(.got) }
   _got_end = .;
   .got.plt		: { *(.got.plt) }
-  .data			: { *(.data) }
   _edata = .;
 
   . = BSS_START;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25  9:51               ` Hector Martin
@ 2010-02-25 18:30                 ` Nicolas Pitre
  0 siblings, 0 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> Nicolas Pitre wrote:
> > Obviously.  But that's the core of the argument: we made sure that the 
> > previous implementation didn't use any static within functions.  Why do 
> > you need to do that?  This is usually a bad idea anyway as this makes 
> > the function non reentrant.
> 
> Static inside functions doesn't imply being non-reentrant (any more than
> using global variables does).

Usually it does, unless as you mention those are used read-only.
I think we don't have to teach each other how a C compiler works, right?

But just to make sure we're on the same page: if you need a global 
variable, or a static variable within a function, it is preferable if 
they're truly read-only by marking them const.  This has the advantage 
of not having to bother with .data relocation in RAM when booting a 
kernel directly from ROM as they get allocated to the .rodata section 
which is included in .text by default by the linker.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 10:05                 ` Hector Martin
@ 2010-02-25 18:35                   ` Nicolas Pitre
  2010-02-25 19:21                     ` Hector Martin
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 18:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> If you can programmatically ensure that the decompressors do not use
> anything but GOT entries to access data, that's certainly a better
> solution than what we have now (I don't mean actually "fixing" them as
> in -Dstatic=, just something that makes the build fail if this is not
> the case so someone else can go and make the required changes). Maybe
> building the object and then grepping the relocations for offenders, or
> something like that.

Russell made a patch which simply ends up discarding .data section.

> However, considering my second patch can be adapted to avoid the two
> extra copies, I still think that's a cleaner solution. All I have to do
> is move things around in the linker script so the piggy is at the end of
> the zImage, avoid copying that the first time around (and instead pass
> the old pointer to ROM to the decompressor), and introduce the real
> decompressed uncompressed kernel size to the mix so I can relocate the
> decompressor to ZRELADDR+kernel_size (and fix the relocation check code
> so it uses the real size, not some 4x guess), which would avoid the
> second copy.

You are still copying the actual decompressor code which is less optimal 
than what we already have.  The current solution, as I explained 
already, involves _zero_ copying.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 18:35                   ` Nicolas Pitre
@ 2010-02-25 19:21                     ` Hector Martin
  2010-02-25 19:40                       ` Nicolas Pitre
  0 siblings, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-25 19:21 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
> On Thu, 25 Feb 2010, Hector Martin wrote:
> 
>> If you can programmatically ensure that the decompressors do not use
>> anything but GOT entries to access data, that's certainly a better
>> solution than what we have now (I don't mean actually "fixing" them as
>> in -Dstatic=, just something that makes the build fail if this is not
>> the case so someone else can go and make the required changes). Maybe
>> building the object and then grepping the relocations for offenders, or
>> something like that.
> 
> Russell made a patch which simply ends up discarding .data section.

You still have the issue of the .bss section. Though I must admit this
would be rare, a decompressor using the .bss section would still cause
silent breakage. And I'm still not convinced that compiler behavior is
defined such that this cannot break in the future.

> You are still copying the actual decompressor code which is less optimal 
> than what we already have.  The current solution, as I explained 
> already, involves _zero_ copying.

The overhead of copying the decompressor is negligible. We're talking
somewhere along the lines of 0.1% of the time spent inside the
decompressor, worst case.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 12:24                 ` Russell King - ARM Linux
@ 2010-02-25 19:24                   ` Nicolas Pitre
  2010-02-25 19:34                     ` Russell King - ARM Linux
  2010-02-25 23:48                     ` Russell King - ARM Linux
  0 siblings, 2 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 19:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Russell King - ARM Linux wrote:

> And here's the patch - not run tested yet, but compile-tested for
> gzip and LZO in both ROM=y and ROM=n modes.
> 
>     ARM: Eliminate decompressor -Dstatic= PIC hack
> 
>     We used to build decompressors with -Dstatic= to avoid any local data
>     being generated.  The problem is that local data generates GOTOFF
>     relocations, which means we can't relocate the data relative to the
>     text segment.
> 
>     Global data, on the other hand, goes through the GOT, and can be
>     relocated anywhere.
> 
>     Unfortunately, with the new decompressors, this presents a problem
>     since they declare static data within functions, and this leads to
>     stack overflow.
> 
>     Fix this by separating out the decompressor code into a separate file,
>     and removing 'static' from BSS data in misc.c.
> 
>     Also, discard the .data section - this means that should we end up
>     with read/write initialized data, the decompressor will fail to link
>     and the problem will be obvious.
> 
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Acked-by: Nicolas Pitre <nico@fluxnic.net>

However...

> @@ -14,6 +14,7 @@ SECTIONS
>    /DISCARD/ : {
>      *(.ARM.exidx*)
>      *(.ARM.extab*)
> +    *(.data)
>    }

I'd put a comment there, otherwise lots of people will go WTF when 
coming across this.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:24                   ` Nicolas Pitre
@ 2010-02-25 19:34                     ` Russell King - ARM Linux
  2010-02-25 23:48                     ` Russell King - ARM Linux
  1 sibling, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 02:24:01PM -0500, Nicolas Pitre wrote:
> I'd put a comment there, otherwise lots of people will go WTF when 
> coming across this.

Added, thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:21                     ` Hector Martin
@ 2010-02-25 19:40                       ` Nicolas Pitre
  2010-02-25 19:56                         ` Hector Martin
  2010-02-25 20:30                         ` Russell King - ARM Linux
  0 siblings, 2 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> Nicolas Pitre wrote:
> > On Thu, 25 Feb 2010, Hector Martin wrote:
> > 
> >> If you can programmatically ensure that the decompressors do not use
> >> anything but GOT entries to access data, that's certainly a better
> >> solution than what we have now (I don't mean actually "fixing" them as
> >> in -Dstatic=, just something that makes the build fail if this is not
> >> the case so someone else can go and make the required changes). Maybe
> >> building the object and then grepping the relocations for offenders, or
> >> something like that.
> > 
> > Russell made a patch which simply ends up discarding .data section.
> 
> You still have the issue of the .bss section. Though I must admit this
> would be rare, a decompressor using the .bss section would still cause
> silent breakage.

What breakage?  We do use the .bss right now.  But .bss can be fixed to 
some absolute address in RAM with no further handling besides clearing 

I found the following at the top of arch/arm/boot/compressed/misc.c:

 * Nicolas Pitre <nico@visuaide.com>  1999/04/14 :
 *  For this code to run directly from Flash, all constant variables must
 *  be marked with 'const' and all other variables initialized at run-time
 *  only.  This way all non constant variables will end up in the bss segment,
 *  which should point to addresses in RAM and cleared to 0 on start.
 *  This allows for a much quicker boot time.

> And I'm still not convinced that compiler behavior is defined such 
> that this cannot break in the future.

Seems to me that we're relying on compiler conventions that have been 
around for decades already.  What do you expect to break?

> > You are still copying the actual decompressor code which is less optimal 
> > than what we already have.  The current solution, as I explained 
> > already, involves _zero_ copying.
> 
> The overhead of copying the decompressor is negligible. We're talking
> somewhere along the lines of 0.1% of the time spent inside the
> decompressor, worst case.

Possible.  And granted this whole scheme was elaborated in 1999 when 
CPUs were much slower and the kernel much smaller relative to the 
decompressor, etc.  But again that "just works" already, without the 
copy overhead, and the code needed to make the copy (I know it's only 4 
asm instructions), and the complexity needed to care about the actual 
decompressed kernel size, etc. etc.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:40                       ` Nicolas Pitre
@ 2010-02-25 19:56                         ` Hector Martin
  2010-02-25 20:29                           ` Nicolas Pitre
  2010-02-25 20:30                         ` Russell King - ARM Linux
  1 sibling, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-25 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
> On Thu, 25 Feb 2010, Hector Martin wrote:
>> Nicolas Pitre wrote:
>>> Russell made a patch which simply ends up discarding .data section.
>> You still have the issue of the .bss section. Though I must admit this
>> would be rare, a decompressor using the .bss section would still cause
>> silent breakage.
> 
> What breakage?  We do use the .bss right now.  But .bss can be fixed to 
> some absolute address in RAM with no further handling besides clearing

static data in BSS can end up addressed with a GOT-relative offset (even
if you put it at "some absolute address" - this is still PIC code). That
 will fail once you move around the text/rodata/GOT blob when running
the ROM code from RAM, since it'll push the BSS section somewhere else
(potentially not RAM). So you still need to ensure that all BSS data is
globally visible.

> I found the following at the top of arch/arm/boot/compressed/misc.c:
> 
>  * Nicolas Pitre <nico@visuaide.com>  1999/04/14 :
>  *  For this code to run directly from Flash, all constant variables must
>  *  be marked with 'const' and all other variables initialized at run-time
>  *  only.  This way all non constant variables will end up in the bss segment,
>  *  which should point to addresses in RAM and cleared to 0 on start.
>  *  This allows for a much quicker boot time.

This is a different issue, the part where so far the code hasn't been
copying data. This is unrelated to the issue that -Dstatic= tries to
solve, which has to do with the way data can be located relative to the GOT.

>> And I'm still not convinced that compiler behavior is defined such 
>> that this cannot break in the future.
> 
> Seems to me that we're relying on compiler conventions that have been 
> around for decades already.  What do you expect to break?

Compilers are getting smarter, and code relying on old assumptions that
may no longer be true (because compilers optimize better or do things
differently) is precisely that which is most liable to break. The only
way you can properly guarantee things will work is by sticking to actual
standards and doing what the compiler expects you to do. Moving around
text/rodata without moving around bss/(maybe data) isn't something
allowed by the compiler. Sure, you can hack around it like we're doing
here, but you're relying on your hacks correctly preventing the compiler
from doing something valid yet which breaks your assumptions.

> Possible.  And granted this whole scheme was elaborated in 1999 when 
> CPUs were much slower and the kernel much smaller relative to the 
> decompressor, etc.  But again that "just works" already, without the 
> copy overhead, and the code needed to make the copy (I know it's only 4 
> asm instructions), and the complexity needed to care about the actual 
> decompressed kernel size, etc. etc.

Well, if you're into speed anyway, caring about the actual kernel size
is desirable because it avoids a totally useless copy when the zImage is
loaded into RAM somewhere near the end of where the kernel would be. The
guesstimate in the code will think it'll overlap and move things around
unnecessarily. Worse, if the kernel ends up being highly compressible
for whatever reason, it could overflow the 4x guesstimate and make the
whole compressed loader fail again if it happens to be around there.

But this discussion is rather pointless because at this stage I highly
doubt Russell is going to change his mind.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:56                         ` Hector Martin
@ 2010-02-25 20:29                           ` Nicolas Pitre
  2010-02-25 21:05                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 20:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Hector Martin wrote:

> Compilers are getting smarter, and code relying on old assumptions that
> may no longer be true (because compilers optimize better or do things
> differently) is precisely that which is most liable to break. The only
> way you can properly guarantee things will work is by sticking to actual
> standards and doing what the compiler expects you to do. Moving around
> text/rodata without moving around bss/(maybe data) isn't something
> allowed by the compiler. Sure, you can hack around it like we're doing
> here, but you're relying on your hacks correctly preventing the compiler
> from doing something valid yet which breaks your assumptions.

Sure.  And so far we've used a naive hack with -Dstatic.  Russell's 
patch is now removing the hack and making sure the restrictions we count 
on are respected wrt .data section.  If we need to ensure no static .bss 
variable are used we might just add this in the Makefile:

	[ "$(nm compressed/vmlinux | grep " b ")" != "" ]

> Well, if you're into speed anyway, caring about the actual kernel size
> is desirable because it avoids a totally useless copy when the zImage is
> loaded into RAM somewhere near the end of where the kernel would be. The
> guesstimate in the code will think it'll overlap and move things around
> unnecessarily. Worse, if the kernel ends up being highly compressible
> for whatever reason, it could overflow the 4x guesstimate and make the
> whole compressed loader fail again if it happens to be around there.

Those are valid points, and rather orthogonal to the ROMable 
decompressor issue.  Better not to mix them together.

And I vaguely remember trying to do just that a couple years ago: 
getting hold of the decompressed kernel size and feeding that into the 
decompressor code.  I somehow didn't come up with a satisfactory 
solution and never revisited it since.

> But this discussion is rather pointless because at this stage I highly
> doubt Russell is going to change his mind.

If you produce a patch for the above, and it is clean and portable, then 
I suspect he would just merge it.  Only at that point we could 
reconsider the ROMable decompressor issue.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:40                       ` Nicolas Pitre
  2010-02-25 19:56                         ` Hector Martin
@ 2010-02-25 20:30                         ` Russell King - ARM Linux
  1 sibling, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 20:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 02:40:43PM -0500, Nicolas Pitre wrote:
> On Thu, 25 Feb 2010, Hector Martin wrote:
> > And I'm still not convinced that compiler behavior is defined such 
> > that this cannot break in the future.
> 
> Seems to me that we're relying on compiler conventions that have been 
> around for decades already.  What do you expect to break?

Sigh.  It's well defined behaviour - it's part of the ELF ABI.  There's
nothing undefined or spurious about it, and it's not going away any time
soon.

For example, my C library has this:

00ca5d60 B __environ

as part of it's BSS segment.  If I do:

#include <stdio.h>
char **__environ;
int main() {
        printf("&__environ = %p\n", &__environ);
        printf("__environ = %p\n", __environ);
        printf("__environ[0] = %s\n", __environ[0]);
        return 0;
}

I get:

080496a0 B __environ

as part of the program.  So, as soon as I run this program, there exists
two '__environ' variables - one in the program and one in the C library.

Let's see what happens when I run the program:

&__environ = 0x80496a0
__environ = 0xbfa01c1c
__environ[0] = HOSTNAME=rmk-PC

That's definitely the one in the application program.  How can the program,
which clearly has this variable as part of its BSS segment independent of
the C library, and which never initializes this BSS variable, end up with
data in there?

It's well defined behaviour brought about by global data and the GOT,
where the shared library is pointed to the version in the user program
rather than its own BSS - independent of the rest of the shared library's
text, data, and bss segments.  When the shared library reads or writes
'__environ', it does it via the GOT which is redirected _at run time_ to
point at the application binary copy by the dynamic linker.

And now try changing that 'char **__environ' to be extern, and see what
effect that has:

080496e0 B __environ@@GLIBC_2.0

Yes, still part of the application binary (this time versioned with the
C library version), and the C library still finds that instead of its
own copy.

But, build an application binary which doesn't reference __environ, and
the version in the C library's BSS segment will be used instead.

This is not undefined compiler behaviour.  This is well defined behaviour,
and it's precisely this behaviour that we're using.  We're directing the
*global* data references to some other part of memory independent of the
rest of the decompressor image.

So, rather than wasting my time with this crap, can we please move on?

I guess I could've pulled several people's trees during the time it's
taken to write this email which shouldn't have been necessary.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 20:29                           ` Nicolas Pitre
@ 2010-02-25 21:05                             ` Russell King - ARM Linux
  2010-02-25 21:25                               ` Nicolas Pitre
  0 siblings, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 21:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 03:29:07PM -0500, Nicolas Pitre wrote:
> On Thu, 25 Feb 2010, Hector Martin wrote:
> > But this discussion is rather pointless because at this stage I highly
> > doubt Russell is going to change his mind.
> 
> If you produce a patch for the above, and it is clean and portable, then 
> I suspect he would just merge it.  Only at that point we could 
> reconsider the ROMable decompressor issue.

Hector's arguments have been based around denying various well defined
features of the ELF shared library support, and it would be wreckless
of me to accept changes based upon denials.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 21:05                             ` Russell King - ARM Linux
@ 2010-02-25 21:25                               ` Nicolas Pitre
  0 siblings, 0 replies; 41+ messages in thread
From: Nicolas Pitre @ 2010-02-25 21:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Feb 2010, Russell King - ARM Linux wrote:

> On Thu, Feb 25, 2010 at 03:29:07PM -0500, Nicolas Pitre wrote:
> > On Thu, 25 Feb 2010, Hector Martin wrote:
> > > But this discussion is rather pointless because at this stage I highly
> > > doubt Russell is going to change his mind.
> > 
> > If you produce a patch for the above, and it is clean and portable, then 
> > I suspect he would just merge it.  Only at that point we could 
> > reconsider the ROMable decompressor issue.
> 
> Hector's arguments have been based around denying various well defined
> features of the ELF shared library support, and it would be wreckless
> of me to accept changes based upon denials.

I think we've pretty much settled this now.

It is however true that the decompressor code doesn't have the 
decompressed kernel size and that a 4x expansion is assumed which is not 
optimal.  This alone, without mixing it up with other issues, should be 
sufficient for considering an eventual patch from Hector if his fix is 
well done.  That's what I was referring to above, and I think the 
ROMable decompressor structure change can be postponed until real issues 
with it actually arise.


Nicolas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 19:24                   ` Nicolas Pitre
  2010-02-25 19:34                     ` Russell King - ARM Linux
@ 2010-02-25 23:48                     ` Russell King - ARM Linux
  2010-02-25 23:55                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 23:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 02:24:01PM -0500, Nicolas Pitre wrote:
> However...
> 
> > @@ -14,6 +14,7 @@ SECTIONS
> >    /DISCARD/ : {
> >      *(.ARM.exidx*)
> >      *(.ARM.extab*)
> > +    *(.data)
> >    }
> 
> I'd put a comment there, otherwise lots of people will go WTF when 
> coming across this.

Unfortunately, the decompressor in ROM mode has been subtly broken for
a while - since 2552fc2.

This assumes 'sp' is the end of the image, which for ROM=n, is fine.
However, for ROM=y, this most definitely is wrong, so this patch is
needed.

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 99b75aa..6b7205a 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -170,8 +170,8 @@ not_angel:
 
 		.text
 		adr	r0, LC0
- ARM(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip, sp}	)
- THUMB(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip}	)
+ ARM(		ldmia	r0, {r1, r2, r3, r4, r5, r6, r11, ip, sp})
+ THUMB(		ldmia	r0, {r1, r2, r3, r4, r5, r6, r11, ip}	)
  THUMB(		ldr	sp, [r0, #28]				)
 		subs	r0, r0, r1		@ calculate the delta offset
 
@@ -182,12 +182,14 @@ not_angel:
 		/*
 		 * We're running at a different address.  We need to fix
 		 * up various pointers:
-		 *   r5 - zImage base address
-		 *   r6 - GOT start
+		 *   r5 - zImage base address (_start)
+		 *   r6 - end of zImage text (_etext)
+		 *   r11 - GOT start
 		 *   ip - GOT end
 		 */
 		add	r5, r5, r0
 		add	r6, r6, r0
+		add	r11, r11, r0
 		add	ip, ip, r0
 
 #ifndef CONFIG_ZBOOT_ROM
@@ -205,10 +207,10 @@ not_angel:
 		/*
 		 * Relocate all entries in the GOT table.
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
+1:		ldr	r1, [r11, #0]		@ relocate entries in the GOT
 		add	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
+		str	r1, [r11], #4		@ C references.
+		cmp	r11, ip
 		blo	1b
 #else
 
@@ -216,12 +218,12 @@ not_angel:
 		 * Relocate entries in the GOT table.  We only relocate
 		 * the entries that are outside the (relocated) BSS region.
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
+1:		ldr	r1, [r11, #0]		@ relocate entries in the GOT
 		cmp	r1, r2			@ entry < bss_start ||
 		cmphs	r3, r1			@ _end < entry
 		addlo	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
+		str	r1, [r11], #4		@ C references.
+		cmp	r11, ip
 		blo	1b
 #endif
 
@@ -247,6 +249,7 @@ not_relocated:	mov	r0, #0
  * Check to see if we will overwrite ourselves.
  *   r4 = final kernel address
  *   r5 = start of this image
+ *   r6 = end of this image's text segment
  *   r2 = end of malloc space (and therefore this image)
  * We basically want:
  *   r4 >= r2 -> OK
@@ -254,8 +257,8 @@ not_relocated:	mov	r0, #0
  */
 		cmp	r4, r2
 		bhs	wont_overwrite
-		sub	r3, sp, r5		@ > compressed kernel size
-		add	r0, r4, r3, lsl #2	@ allow for 4x expansion
+		sub	r3, r6, r5		@ (_etext-_start) =~ compressed
+		add	r0, r4, r3, lsl #2	@ kernel size, allow for 4x expansion
 		cmp	r0, r5
 		bls	wont_overwrite
 
@@ -271,7 +274,6 @@ not_relocated:	mov	r0, #0
  * r1-r3  = unused
  * r4     = kernel execution address
  * r5     = decompressed kernel start
- * r6     = processor ID
  * r7     = architecture ID
  * r8     = atags pointer
  * r9-r12,r14 = corrupted
@@ -312,7 +314,8 @@ LC0:		.word	LC0			@ r1
 		.word	_end			@ r3
 		.word	zreladdr		@ r4
 		.word	_start			@ r5
-		.word	_got_start		@ r6
+		.word	_end			@ r6
+		.word	_got_start		@ r11
 		.word	_got_end		@ ip
 		.word	user_stack+4096		@ sp
 LC1:		.word	reloc_end - reloc_start
@@ -336,7 +339,6 @@ params:		ldr	r0, =params_phys
  *
  * On entry,
  *  r4 = kernel execution address
- *  r6 = processor ID
  *  r7 = architecture number
  *  r8 = atags pointer
  *  r9 = run-time address of "start"  (???)
@@ -542,7 +544,6 @@ __common_mmu_cache_on:
  * r1-r3  = unused
  * r4     = kernel execution address
  * r5     = decompressed kernel start
- * r6     = processor ID
  * r7     = architecture ID
  * r8     = atags pointer
  * r9-r12,r14 = corrupted
@@ -581,19 +582,19 @@ call_kernel:	bl	cache_clean_flush
  *  r1  = corrupted
  *  r2  = corrupted
  *  r3  = block offset
- *  r6  = corrupted
+ *  r9  = corrupted
  *  r12 = corrupted
  */
 
 call_cache_fn:	adr	r12, proc_types
 #ifdef CONFIG_CPU_CP15
-		mrc	p15, 0, r6, c0, c0	@ get processor ID
+		mrc	p15, 0, r9, c0, c0	@ get processor ID
 #else
-		ldr	r6, =CONFIG_PROCESSOR_ID
+		ldr	r9, =CONFIG_PROCESSOR_ID
 #endif
 1:		ldr	r1, [r12, #0]		@ get value
 		ldr	r2, [r12, #4]		@ get mask
-		eor	r1, r1, r6		@ (real ^ match)
+		eor	r1, r1, r9		@ (real ^ match)
 		tst	r1, r2			@       & mask
  ARM(		addeq	pc, r12, r3		) @ call cache function
  THUMB(		addeq	r12, r3			)
@@ -778,8 +779,7 @@ proc_types:
  * Turn off the Cache and MMU.  ARMv3 does not support
  * reading the control register, but ARMv4 does.
  *
- * On entry,  r6 = processor ID
- * On exit,   r0, r1, r2, r3, r12 corrupted
+ * On exit, r0, r1, r2, r3, r9, r12 corrupted
  * This routine must preserve: r4, r6, r7
  */
 		.align	5
@@ -852,10 +852,8 @@ __armv3_mmu_cache_off:
 /*
  * Clean and flush the cache to maintain consistency.
  *
- * On entry,
- *  r6 = processor ID
  * On exit,
- *  r1, r2, r3, r11, r12 corrupted
+ *  r1, r2, r3, r9, r11, r12 corrupted
  * This routine must preserve:
  *  r0, r4, r5, r6, r7
  */
@@ -967,7 +965,7 @@ __armv4_mmu_cache_flush:
 		mov	r2, #64*1024		@ default: 32K dcache size (*2)
 		mov	r11, #32		@ default: 32 byte line size
 		mrc	p15, 0, r3, c0, c0, 1	@ read cache type
-		teq	r3, r6			@ cache ID register present?
+		teq	r3, r9			@ cache ID register present?
 		beq	no_cache_id
 		mov	r1, r3, lsr #18
 		and	r1, r1, #7

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-25 23:48                     ` Russell King - ARM Linux
@ 2010-02-25 23:55                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-25 23:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2010 at 11:48:45PM +0000, Russell King - ARM Linux wrote:
> -		.word	_got_start		@ r6
> +		.word	_end			@ r6

Grr, that should be _etext.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 22:30             ` Russell King - ARM Linux
@ 2010-02-25  0:01               ` Hector Martin
  0 siblings, 0 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-25  0:01 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Wed, Feb 24, 2010 at 11:09:13PM +0100, Hector Martin wrote:
>> Uwe Kleine-K?nig wrote:
>>> Independant of the outcome of this discussion I suggest reverting the
>>> switch to the new decompressor for .33.  Hectors patch definitely isn't
>>> .33 material and broken or not the old decompressor was more reliable.
>> That's reasonable, as far as .33 goes. Obviously I'm not expecting my
>> patch to make it to this release.
> 
> Could you put together a description for the commit log for this revert
> please?

Something like this?

Revert "arm: add support for LZO-compressed kernels"

This commit switches to a different inflate implementation. This
implementation is subtly broken when used within the current ARM
compressed loader due to some build hacks. Revert the switch until those
hacks can be removed.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 22:09           ` Hector Martin
@ 2010-02-24 22:30             ` Russell King - ARM Linux
  2010-02-25  0:01               ` Hector Martin
  0 siblings, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-24 22:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 11:09:13PM +0100, Hector Martin wrote:
> Uwe Kleine-K?nig wrote:
> > Independant of the outcome of this discussion I suggest reverting the
> > switch to the new decompressor for .33.  Hectors patch definitely isn't
> > .33 material and broken or not the old decompressor was more reliable.
> 
> That's reasonable, as far as .33 goes. Obviously I'm not expecting my
> patch to make it to this release.

Could you put together a description for the commit log for this revert
please?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 15:30         ` Uwe Kleine-König
@ 2010-02-24 22:09           ` Hector Martin
  2010-02-24 22:30             ` Russell King - ARM Linux
  0 siblings, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-24 22:09 UTC (permalink / raw)
  To: linux-arm-kernel

Uwe Kleine-K?nig wrote:
> Independant of the outcome of this discussion I suggest reverting the
> switch to the new decompressor for .33.  Hectors patch definitely isn't
> .33 material and broken or not the old decompressor was more reliable.

That's reasonable, as far as .33 goes. Obviously I'm not expecting my
patch to make it to this release.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 15:20       ` Hector Martin
  2010-02-24 15:30         ` Uwe Kleine-König
@ 2010-02-24 16:29         ` Russell King - ARM Linux
  1 sibling, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-24 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 04:20:18PM +0100, Hector Martin wrote:
> Russell King - ARM Linux wrote:
> > It does work with the previous version of the decompressor.
> 
> Sure, at this point in time. The broken code is still the loader, not
> the new decompressor. If a GCC update had been the culprit, it wouldn't
> be GCC's fault. The fact remains that the behavior is undefined and
> (with current GCC versions) requires carefully crafting the resulting C
> code in order to get reasonable behavior. I don't think you can
> reasonably require that a generic descompressor be maintained by others
> with care for compatibility with this hack; someone would have to police
> changes for potential issues, and you still have a decent chance of
> getting breakage if GCC decides to change its behavior some day. I'd say
> fixing the loader to not require this undefined behavior is a
> considerably better long-term solution.

For starters, do you actually understand how this stuff works, or are
you just viewing the "-Dstatic=" as a hack you don't like?

Let's examine what's going on.  If you build a file with -fpic, then
you're asking the compiler to generate position independent code - eg
for shared libraries.

How this is achieved depends on the architecture.  On ARM, it is achieved
as follows:

1. references to global data are indirected via the GOT since they can
   be placed anywhere - in the executable or in another shared library.
   This is well defined and the user environment relies upon this
   behaviour.

2. references to local 'static' data are private to the shared library,
   and are addressed relative to the position of the GOT using the
   address of the GOT and an offset from the GOT written into the code.
   This is well defined and the user environment relies upon this
   behaviour.

3. function references to local 'static' functions are made using
   standard branch instructions since these are already pc relative

4. function references to global functions are via the PLT.

Now, with the kernel decompressor, we don't care about (3) or (4) -
indeed, all functions could be static.

However, we do care about (1) vs (2).  For read-only static data, (2)
is entirely acceptable - static read-only data can be placed in the
.rodata section and move with the text segment.

However, for writable local data, even if we tell the linker to locate
the data segment in RAM, the relative offset between the data and text
is fixed at built time.  What this means is that if you try and relocate
the image, any read-write data using (2) will most probably end up
outside RAM.

In order to be able to relocate the data independently of the text
segment, we need the read-write data to be built with global visibility,
thereby causing (1) to be used.

We could investigate whether there's a better solution to "-Dstatic="
to ensure that we end up with (1) for all read/write data.

So, to sum up, contrary to your belief, the behaviour we're invoking
from the toolchain is not some spurious toolchain behaviour, but
well-defined behaviour which, if it changes, results in userspace
breakage.

What is the hack is defining 'static' as a way to get rid of all
local data, thereby avoiding writable data being addressed relative
to the GOT.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 15:20       ` Hector Martin
@ 2010-02-24 15:30         ` Uwe Kleine-König
  2010-02-24 22:09           ` Hector Martin
  2010-02-24 16:29         ` Russell King - ARM Linux
  1 sibling, 1 reply; 41+ messages in thread
From: Uwe Kleine-König @ 2010-02-24 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 04:20:18PM +0100, Hector Martin wrote:
> Russell King - ARM Linux wrote:
> > It does work with the previous version of the decompressor.
> 
> Sure, at this point in time. The broken code is still the loader, not
> the new decompressor. If a GCC update had been the culprit, it wouldn't
> be GCC's fault. The fact remains that the behavior is undefined and
> (with current GCC versions) requires carefully crafting the resulting C
> code in order to get reasonable behavior. I don't think you can
> reasonably require that a generic descompressor be maintained by others
> with care for compatibility with this hack; someone would have to police
> changes for potential issues, and you still have a decent chance of
> getting breakage if GCC decides to change its behavior some day. I'd say
> fixing the loader to not require this undefined behavior is a
> considerably better long-term solution.
> 
> > I noticed that you did not reply to my previous email on this subject,
> > are you intentionally ignoring my responses?
> 
> You had a valid point which I hadn't considered (running ROM images from
> RAM), so I figured I'd give addressing that a shot before going into
> discussion mode.
Independant of the outcome of this discussion I suggest reverting the
switch to the new decompressor for .33.  Hectors patch definitely isn't
.33 material and broken or not the old decompressor was more reliable.

In general I would welcome to get rid of -Dstatic=, but it needs
careful consideration.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24 11:03     ` Russell King - ARM Linux
@ 2010-02-24 15:20       ` Hector Martin
  2010-02-24 15:30         ` Uwe Kleine-König
  2010-02-24 16:29         ` Russell King - ARM Linux
  0 siblings, 2 replies; 41+ messages in thread
From: Hector Martin @ 2010-02-24 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> It does work with the previous version of the decompressor.

Sure, at this point in time. The broken code is still the loader, not
the new decompressor. If a GCC update had been the culprit, it wouldn't
be GCC's fault. The fact remains that the behavior is undefined and
(with current GCC versions) requires carefully crafting the resulting C
code in order to get reasonable behavior. I don't think you can
reasonably require that a generic descompressor be maintained by others
with care for compatibility with this hack; someone would have to police
changes for potential issues, and you still have a decent chance of
getting breakage if GCC decides to change its behavior some day. I'd say
fixing the loader to not require this undefined behavior is a
considerably better long-term solution.

> I noticed that you did not reply to my previous email on this subject,
> are you intentionally ignoring my responses?

You had a valid point which I hadn't considered (running ROM images from
RAM), so I figured I'd give addressing that a shot before going into
discussion mode.

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24  9:27   ` Hector Martin
@ 2010-02-24 11:03     ` Russell King - ARM Linux
  2010-02-24 15:20       ` Hector Martin
  0 siblings, 1 reply; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-24 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2010 at 10:27:18AM +0100, Hector Martin wrote:
> But it doesn't work, and breaks legitimate code.

It does work with the previous version of the decompressor.

I noticed that you did not reply to my previous email on this subject,
are you intentionally ignoring my responses?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-24  8:42 ` Uwe Kleine-König
@ 2010-02-24  9:27   ` Hector Martin
  2010-02-24 11:03     ` Russell King - ARM Linux
  0 siblings, 1 reply; 41+ messages in thread
From: Hector Martin @ 2010-02-24  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

Uwe Kleine-K?nig wrote:
> Hello
> 
> [expanded Cc:]
> 
> On Tue, Feb 23, 2010 at 02:57:17PM +0100, hector at marcansoft.com wrote:
>> From: Hector Martin <hector@marcansoft.com>
>>
>> The -Dstatic= stuff in the Makefile is a hack and subtly breaks the
>> current inflate implementation, because the fixed inflate tables are
>> included into a function and removing static places them in the stack,
> Maybe this is the problem that people see after e7db7b4270?  Is this the
> final hint that we should revert that one?

Quite likely it is this same issue. The switch to the other inflate
implementation is probably what introduced the breakage. However,
e7db7b4270 itself is perfectly fine. The problem has been lurking all
along in the -Dstatic= thing, and that would've caused issues with
several perfectly valid C constructs in the kernel lib/inflate stuff.

I can also 'fix' the issue with a two-line patch to the new deflate
code, but that's not a proper solution, just an ugly workaround. I don't
think restricting the inflate code to a weird subset of C due to ARM
peculiarities is a good idea.

This can also break with GCC updates - the current code depends on
unspecified behavior of the compiler. Essentially, the split relocation
is something that is not allowed nor expected by the compiler, and
-Dstatic= is a crude hack to attempt to modify code so it doesn't
trigger the cases where the hack breaks down. But it doesn't work, and
breaks legitimate code. Even worse, the bugs it introduces can be subtle
heisenbugs, as was the case here - I spent two afternoons tracking it
down through the bowels of inflate and comparing the output vs. an x86
build. In a worst-case scenario, the bug could've caused corrupted
kernel output (not just a flat-out hang or crash while uncompressing),
which would've been so much fun to debug (*cough*).

-- 
Hector Martin (hector at marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-23 13:57 hector at marcansoft.com
  2010-02-23 14:07 ` Russell King - ARM Linux
@ 2010-02-24  8:42 ` Uwe Kleine-König
  2010-02-24  9:27   ` Hector Martin
  1 sibling, 1 reply; 41+ messages in thread
From: Uwe Kleine-König @ 2010-02-24  8:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hello

[expanded Cc:]

On Tue, Feb 23, 2010 at 02:57:17PM +0100, hector at marcansoft.com wrote:
> From: Hector Martin <hector@marcansoft.com>
> 
> The -Dstatic= stuff in the Makefile is a hack and subtly breaks the
> current inflate implementation, because the fixed inflate tables are
> included into a function and removing static places them in the stack,
Maybe this is the problem that people see after e7db7b4270?  Is this the
final hint that we should revert that one?

> which bloats the stack and breaks references after the function returns.
> So get rid of the hack.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
  2010-02-23 13:57 hector at marcansoft.com
@ 2010-02-23 14:07 ` Russell King - ARM Linux
  2010-02-24  8:42 ` Uwe Kleine-König
  1 sibling, 0 replies; 41+ messages in thread
From: Russell King - ARM Linux @ 2010-02-23 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 23, 2010 at 02:57:17PM +0100, hector at marcansoft.com wrote:
> As far as I can tell, ROM PIC would've never worked with the existing
> code (because, among other things, the GOT ends up in ROM), so this
> shouldn't break anything.

Clearly then you don't understand the implementation.

With ROM mode, you build it for the address you want to run it from,
and if you place it into ROM, you *must* run it from that address
because the GOT table can't be updated.

However, the intention is that you can still run the image from RAM
and have it work correctly - and it does, because that's a situation
I've tested when this was initially written.

Maybe the recent switch to the new inflate code broke this - in which
case that's a problem with the new inflate code which should be fixed.
Putting static data in functions is not a particularly nice thing to
do.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Clean up ARM compressed loader
@ 2010-02-23 13:57 hector at marcansoft.com
  2010-02-23 14:07 ` Russell King - ARM Linux
  2010-02-24  8:42 ` Uwe Kleine-König
  0 siblings, 2 replies; 41+ messages in thread
From: hector at marcansoft.com @ 2010-02-23 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Hector Martin <hector@marcansoft.com>

The -Dstatic= stuff in the Makefile is a hack and subtly breaks the
current inflate implementation, because the fixed inflate tables are
included into a function and removing static places them in the stack,
which bloats the stack and breaks references after the function returns.
So get rid of the hack.

Instead, clean up the stub loader so PIC vs ROM mode is handled sanely
now. You can build two notably different versions now:

CONFIG_ZBOOT_ROM=y
 - Non-PIC: loader will halt if it finds itself at the wrong address.
 - Separate ROM and RAM bases (CONFIG_ZBOOT_ROM_TEXT and
   CONFIG_ZBOOT_ROM_BSS, the latter being a slight misnomer now).
 - Proper initialized data handling. Initialized data is copied from ROM
   to RAM on boot (VMA != LMA in linker script).
 - Assumes that the decompressed kernel will not overlap with the
   RAM used by the decompressor itself (CONFIG_ZBOOT_ROM_BSS should be
   somewhere after the loaded kernel address, e.g. near the end of RAM).
   No post-decompression relocation is done.

CONFIG_ZBOOT_ROM=n
 - PIC code: you can load this anywhere
 - Single blob, data follows code.
 - The loaded kernel may overlap with the loader, in which case it is
   relocated after compression as usual.

As far as I can tell, ROM PIC would've never worked with the existing
code (because, among other things, the GOT ends up in ROM), so this
shouldn't break anything.

This is untested on a real ROM platform, but has been tested using RAM
as fake ROM.

Cc: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Hector Martin <hector@marcansoft.com>
---
 arch/arm/Kconfig                        |   10 ++-
 arch/arm/boot/compressed/Makefile       |   25 +++----
 arch/arm/boot/compressed/head.S         |  129 ++++++++++++++++--------------
 arch/arm/boot/compressed/misc.c         |    3 +-
 arch/arm/boot/compressed/vmlinux.lds.in |   29 ++++++--
 5 files changed, 110 insertions(+), 86 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 184a6bd..deb9eef 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1278,8 +1278,8 @@ config ZBOOT_ROM_BSS
 	help
 	  The base address of an area of read/write memory in the target
 	  for the ROM-able zImage which must be available while the
-	  decompressor is running. It must be large enough to hold the
-	  entire decompressed kernel plus an additional 128 KiB.
+	  decompressor is running. It must be able to hold at least 128 KiB
+	  of data and it must not overlap with the loaded kernel.
 	  Platforms which normally make use of ROM-able zImage formats
 	  normally set this to a suitable value in their defconfig file.
 
@@ -1290,7 +1290,11 @@ config ZBOOT_ROM
 	depends on ZBOOT_ROM_TEXT != ZBOOT_ROM_BSS
 	help
 	  Say Y here if you intend to execute your compressed kernel image
-	  (zImage) directly from ROM or flash.  If unsure, say N.
+	  (zImage) directly from ROM or flash. If you say Y, then the zImage
+	  will only run from the ZBOOT_ROM_TEXT address (it will not be
+	  position-independent).
+
+	  If unsure, say N.
 
 config CMDLINE
 	string "Default kernel command string"
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 2d4d88b..acfd324 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -48,20 +48,22 @@ else
 endif
 endif
 
-#
-# We now have a PIC decompressor implementation.  Decompressors running
-# from RAM should not define ZTEXTADDR.  Decompressors running directly
-# from ROM or Flash must define ZTEXTADDR (preferably via the config)
-# FIXME: Previous assignment to ztextaddr-y is lost here. See SHARK
+# RAM builds should disable CONFIG_ZBOOT_ROM. The decompressor will be
+# built as position independent code and can be loaded at any address.
+# ROM builds should enable CONFIG_ZBOOT_ROM and set CONFIG_ZBOOT_ROM_TEXT
+# and CONFIG_ZBOOT_ROM_BSS. They will be built as position dependent code
+# and will only run when loaded at CONFIG_ZBOOT_ROM_TEXT.
 ifeq ($(CONFIG_ZBOOT_ROM),y)
 ZTEXTADDR	:= $(CONFIG_ZBOOT_ROM_TEXT)
-ZBSSADDR	:= $(CONFIG_ZBOOT_ROM_BSS)
+ZDATAADDR	:= $(CONFIG_ZBOOT_ROM_BSS)
+EXTRA_CFLAGS	:= -fno-builtin
 else
 ZTEXTADDR	:= 0
-ZBSSADDR	:= ALIGN(4)
+ZDATAADDR	:= ALIGN(4)
+EXTRA_CFLAGS	:= -fpic -fno-builtin
 endif
 
-SEDFLAGS	= s/TEXT_START/$(ZTEXTADDR)/;s/BSS_START/$(ZBSSADDR)/
+SEDFLAGS	= s/TEXT_START/$(ZTEXTADDR)/;s/DATA_START/$(ZDATAADDR)/
 
 suffix_$(CONFIG_KERNEL_GZIP) = gzip
 suffix_$(CONFIG_KERNEL_LZO)  = lzo
@@ -75,7 +77,6 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
 KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
 endif
 
-EXTRA_CFLAGS  := -fpic -fno-builtin
 EXTRA_AFLAGS  := -Wa,-march=all
 
 # Supply ZRELADDR, INITRD_PHYS and PARAMS_PHYS to the decompressor via
@@ -106,10 +107,6 @@ lib1funcs = $(obj)/lib1funcs.o
 $(obj)/lib1funcs.S: $(srctree)/arch/$(SRCARCH)/lib/lib1funcs.S FORCE
 	$(call cmd,shipped)
 
-# Don't allow any static data in misc.o, which
-# would otherwise mess up our GOT table
-CFLAGS_misc.o := -Dstatic=
-
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
 	 	$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) FORCE
 	$(call if_changed,ld)
@@ -120,8 +117,6 @@ $(obj)/piggy.$(suffix_y): $(obj)/../Image FORCE
 
 $(obj)/piggy.$(suffix_y).o:  $(obj)/piggy.$(suffix_y) FORCE
 
-CFLAGS_font.o := -Dstatic=
-
 $(obj)/font.c: $(FONTC)
 	$(call cmd,shipped)
 
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 4fddc50..dd7050b 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -169,69 +169,79 @@ not_angel:
 		 */
 
 		.text
-		adr	r0, LC0
- ARM(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip, sp}	)
- THUMB(		ldmia	r0, {r1, r2, r3, r4, r5, r6, ip}	)
- THUMB(		ldr	sp, [r0, #28]				)
-		subs	r0, r0, r1		@ calculate the delta offset
+		ldr	sp, =__stack_end
+
+here:
+		adr	r0, here
+		ldr	r1, =here
+		subs	r12, r0, r1		@ calculate the delta offset
 
 						@ if delta is zero, we are
 		beq	not_relocated		@ running at the address we
 						@ were linked at.
 
+#ifndef CONFIG_ZBOOT_ROM
+
 		/*
-		 * We're running at a different address.  We need to fix
-		 * up various pointers:
-		 *   r5 - zImage base address
-		 *   r6 - GOT start
-		 *   ip - GOT end
+		 * We're running at a different address. Fix up the stack.
 		 */
-		add	r5, r5, r0
-		add	r6, r6, r0
-		add	ip, ip, r0
+		add	sp, sp, r12
 
-#ifndef CONFIG_ZBOOT_ROM
 		/*
-		 * If we're running fully PIC === CONFIG_ZBOOT_ROM = n,
-		 * we need to fix up pointers into the BSS region.
-		 *   r2 - BSS start
-		 *   r3 - BSS end
-		 *   sp - stack pointer
+		 * Fix up the GOT address
 		 */
-		add	r2, r2, r0
-		add	r3, r3, r0
-		add	sp, sp, r0
+		ldr	r1, =_got_start
+		add	r1, r1, r12
+		ldr	r2, =_got_end
+		add	r2, r2, r12
 
 		/*
 		 * Relocate all entries in the GOT table.
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
-		add	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
+1:		ldr	r0, [r1]		@ relocate entries in the GOT
+		add	r0, r0, r12		@ table.  This fixes up the
+		str	r0, [r1], #4		@ C references.
+		cmp	r1, r2
 		blo	1b
+
 #else
+		/*
+		 * This ROM build is running from the wrong address, panic
+		 */
+		b	.
+#endif
+
+not_relocated:
+		ldr	r1, =__bss_start
+		add	r1, r1, r12
+		ldr	r2, =__bss_end
+		add	r2, r2, r12
 
+		mov	r0, #0
+1:		str	r0, [r1], #4		@ clear bss
+		str	r0, [r1], #4
+		str	r0, [r1], #4
+		str	r0, [r1], #4
+		cmp	r1, r2
+		blo	1b
+
+#ifdef CONFIG_ZBOOT_ROM
 		/*
-		 * Relocate entries in the GOT table.  We only relocate
-		 * the entries that are outside the (relocated) BSS region.
+		 * Copy the initialized data from ROM to RAM (LMA to VMA)
 		 */
-1:		ldr	r1, [r6, #0]		@ relocate entries in the GOT
-		cmp	r1, r2			@ entry < bss_start ||
-		cmphs	r3, r1			@ _end < entry
-		addlo	r1, r1, r0		@ table.  This fixes up the
-		str	r1, [r6], #4		@ C references.
-		cmp	r6, ip
+		ldr	r1, =__data_start
+		ldr	r2, =__data_end
+		ldr	r3, =__data_lma
+
+1:
+		ldmia	r3!, {r0, r4, r5, r6}
+		stmia	r1!, {r0, r4, r5, r6}
+		cmp	r1, r2
 		blo	1b
+
 #endif
 
-not_relocated:	mov	r0, #0
-1:		str	r0, [r2], #4		@ clear bss
-		str	r0, [r2], #4
-		str	r0, [r2], #4
-		str	r0, [r2], #4
-		cmp	r2, r3
-		blo	1b
+		ldr	r4, =zreladdr
 
 		/*
 		 * The C runtime environment should now be setup
@@ -244,6 +254,12 @@ not_relocated:	mov	r0, #0
 		add	r2, sp, #0x10000	@ 64k max
 
 /*
+ * It is assumed that in ROM builds the decompressed kernel will not overwrite
+ * our RAM .data/.bss/stack/malloc area
+ */
+#ifndef CONFIG_ZBOOT_ROM
+
+/*
  * Check to see if we will overwrite ourselves.
  *   r4 = final kernel address
  *   r5 = start of this image
@@ -252,6 +268,8 @@ not_relocated:	mov	r0, #0
  *   r4 >= r2 -> OK
  *   r4 + image length <= r5 -> OK
  */
+		ldr	r5, =_start
+		add	r5, r5, r12
 		cmp	r4, r2
 		bhs	wont_overwrite
 		sub	r3, sp, r5		@ > compressed kernel size
@@ -278,7 +296,7 @@ not_relocated:	mov	r0, #0
  */
 		add	r1, r5, r0		@ end of decompressed kernel
 		adr	r2, reloc_start
-		ldr	r3, LC1
+		ldr	r3, reloc_size
 		add	r3, r2, r3
 1:		ldmia	r2!, {r9 - r12, r14}	@ copy relocation code
 		stmia	r1!, {r9 - r12, r14}
@@ -294,6 +312,11 @@ not_relocated:	mov	r0, #0
  THUMB(		add	r12, r5, r0		)
  THUMB(		mov	pc, r12			) @ call relocation code
 
+reloc_size:
+		.word reloc_end - reloc_start
+
+#endif
+
 /*
  * We're not in danger of overwriting ourselves.  Do this the simple way.
  *
@@ -305,26 +328,13 @@ wont_overwrite:	mov	r0, r4
 		bl	decompress_kernel
 		b	call_kernel
 
-		.align	2
-		.type	LC0, #object
-LC0:		.word	LC0			@ r1
-		.word	__bss_start		@ r2
-		.word	_end			@ r3
-		.word	zreladdr		@ r4
-		.word	_start			@ r5
-		.word	_got_start		@ r6
-		.word	_got_end		@ ip
-		.word	user_stack+4096		@ sp
-LC1:		.word	reloc_end - reloc_start
-		.size	LC0, . - LC0
-
 #ifdef CONFIG_ARCH_RPC
 		.globl	params
 params:		ldr	r0, =params_phys
 		mov	pc, lr
-		.ltorg
 		.align
 #endif
+		.ltorg
 
 /*
  * Turn on the cache.  We need to setup some page tables so that we
@@ -548,6 +558,8 @@ __common_mmu_cache_on:
  * r9-r12,r14 = corrupted
  */
 		.align	5
+
+#ifndef CONFIG_ZBOOT_ROM
 reloc_start:	add	r9, r5, r0
 		sub	r9, r9, #128		@ do not copy the stack
 		debug_reloc_start
@@ -563,6 +575,7 @@ reloc_start:	add	r9, r5, r0
 		mov	sp, r1
 		add	sp, sp, #128		@ relocate the stack
 		debug_reloc_end
+#endif
 
 call_kernel:	bl	cache_clean_flush
 		bl	cache_off
@@ -1076,7 +1089,3 @@ memdump:	mov	r12, r0
 
 		.ltorg
 reloc_end:
-
-		.align
-		.section ".stack", "w"
-user_stack:	.space	4096
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index 56a0d11..97018f8 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -23,7 +23,6 @@ unsigned int __machine_arch_type;
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
 #include <linux/stddef.h>	/* for NULL */
-#include <asm/string.h>
 #include <linux/linkage.h>
 
 #include <asm/unaligned.h>
@@ -166,7 +165,7 @@ void __memzero (__ptr_t s, size_t n)
 		*u.ucp++ = 0;
 }
 
-static inline __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
+static __ptr_t memcpy(__ptr_t __dest, __const __ptr_t __src,
 			    size_t __n)
 {
 	int i = 0;
diff --git a/arch/arm/boot/compressed/vmlinux.lds.in b/arch/arm/boot/compressed/vmlinux.lds.in
index a5924b9..46efdf4 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.in
+++ b/arch/arm/boot/compressed/vmlinux.lds.in
@@ -36,19 +36,36 @@ SECTIONS
 
   _etext = .;
 
+  /* The GOT is only for PIC builds, so it should not exist in the ROM build */
   _got_start = .;
   .got			: { *(.got) }
   _got_end = .;
   .got.plt		: { *(.got.plt) }
-  .data			: { *(.data) }
-  _edata = .;
 
-  . = BSS_START;
+  . = ALIGN(16);
+
+  __data_lma = .;
+
+  . = DATA_START;
+
+  __data_start = .;
+  .data : AT(__data_lma) {
+    *(.data)
+    . = ALIGN(16);
+  }
+  __data_end = .;
+  _edata = __data_lma + __data_end - __data_start;
+
   __bss_start = .;
-  .bss			: { *(.bss) }
-  _end = .;
+  .bss : { *(.bss) }
+  . = ALIGN(16);
+  __bss_end = .;
 
-  .stack (NOLOAD)	: { *(.stack) }
+  .stack : {
+    __stack_start = .;
+    . += 4096;
+    __stack_end = .;
+  }
 
   .stab 0		: { *(.stab) }
   .stabstr 0		: { *(.stabstr) }
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-02-25 23:55 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-24  2:23 [PATCH] Clean up ARM compressed loader Hector Martin
2010-02-24  8:51 ` Uwe Kleine-König
2010-02-24  9:28   ` Hector Martin
2010-02-24 22:34 ` Russell King - ARM Linux
2010-02-24 23:34   ` Nicolas Pitre
2010-02-24 23:42     ` Russell King - ARM Linux
2010-02-24 23:57       ` Hector Martin
2010-02-25  0:01         ` Russell King - ARM Linux
2010-02-25  0:30           ` Hector Martin
2010-02-25  4:28             ` Nicolas Pitre
2010-02-25  4:33               ` Nicolas Pitre
2010-02-25  9:38               ` Russell King - ARM Linux
2010-02-25 10:05                 ` Hector Martin
2010-02-25 18:35                   ` Nicolas Pitre
2010-02-25 19:21                     ` Hector Martin
2010-02-25 19:40                       ` Nicolas Pitre
2010-02-25 19:56                         ` Hector Martin
2010-02-25 20:29                           ` Nicolas Pitre
2010-02-25 21:05                             ` Russell King - ARM Linux
2010-02-25 21:25                               ` Nicolas Pitre
2010-02-25 20:30                         ` Russell King - ARM Linux
2010-02-25 12:24                 ` Russell King - ARM Linux
2010-02-25 19:24                   ` Nicolas Pitre
2010-02-25 19:34                     ` Russell King - ARM Linux
2010-02-25 23:48                     ` Russell King - ARM Linux
2010-02-25 23:55                       ` Russell King - ARM Linux
2010-02-25  9:51               ` Hector Martin
2010-02-25 18:30                 ` Nicolas Pitre
2010-02-25  8:23             ` Russell King - ARM Linux
2010-02-25  4:06         ` Nicolas Pitre
  -- strict thread matches above, loose matches on Subject: below --
2010-02-23 13:57 hector at marcansoft.com
2010-02-23 14:07 ` Russell King - ARM Linux
2010-02-24  8:42 ` Uwe Kleine-König
2010-02-24  9:27   ` Hector Martin
2010-02-24 11:03     ` Russell King - ARM Linux
2010-02-24 15:20       ` Hector Martin
2010-02-24 15:30         ` Uwe Kleine-König
2010-02-24 22:09           ` Hector Martin
2010-02-24 22:30             ` Russell King - ARM Linux
2010-02-25  0:01               ` Hector Martin
2010-02-24 16:29         ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.