All of lore.kernel.org
 help / color / mirror / Atom feed
* updates for be8 patch series
@ 2013-07-22 16:33 Ben Dooks
  2013-07-22 16:33 ` [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode Ben Dooks
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

I've updated the BE8 patch series after running further tests on the
kernel. I found an issue with the alignment handler (and I believe the
same issue with the trap handler).

I also ran tests with the module loader, which shows that if we are
in BE8 that it needs to fix up the relocations. Also the modules need
to be built --be8 otherwise they cannot be easily relocated when loaded.

Should patches 3 and 4 be together (ie, alter the LDFLAGS and the
module.c) and have I got the correct LDLAGS?

Comments and testing welcome. I will push out a new series later with
these in and possibly rebased on 3.11-rc2.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode.
  2013-07-22 16:33 updates for be8 patch series Ben Dooks
@ 2013-07-22 16:33 ` Ben Dooks
  2013-07-24 17:16   ` Dave Martin
  2013-07-22 16:33 ` [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order Ben Dooks
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

If we are in BE8 mode, we must deal with the instruction stream being
in LE order when data is being loaded in BE order. Ensure the data is
swapped before processing to avoid thre following:

Change to using <asm/opcodes.h> to provide the necessary conversion
functions to change the byte ordering.

Alignment trap: not handling instruction 030091e8 at [<80333e8c>]
Unhandled fault: alignment exception (0x001) at 0xbfa09567

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 arch/arm/mm/alignment.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index db26e2e..295d615 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -25,6 +25,7 @@
 #include <asm/cp15.h>
 #include <asm/system_info.h>
 #include <asm/unaligned.h>
+#include <asm/opcodes.h>
 
 #include "fault.h"
 
@@ -762,21 +763,24 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	if (thumb_mode(regs)) {
 		u16 *ptr = (u16 *)(instrptr & ~1);
 		fault = probe_kernel_address(ptr, tinstr);
+		tinstr = __mem_to_opcode_thumb16(tinstr);
 		if (!fault) {
 			if (cpu_architecture() >= CPU_ARCH_ARMv7 &&
 			    IS_T32(tinstr)) {
 				/* Thumb-2 32-bit */
 				u16 tinst2 = 0;
 				fault = probe_kernel_address(ptr + 1, tinst2);
+				tinst2 = __mem_to_opcode_thumb16(tinst2);
 				instr = (tinstr << 16) | tinst2;
 				thumb2_32b = 1;
 			} else {
 				isize = 2;
-				instr = thumb2arm(tinstr);
 			}
 		}
-	} else
+	} else {
 		fault = probe_kernel_address(instrptr, instr);
+		instr = __mem_to_opcode_arm(instr);
+	}
 
 	if (fault) {
 		type = TYPE_FAULT;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order
  2013-07-22 16:33 updates for be8 patch series Ben Dooks
  2013-07-22 16:33 ` [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode Ben Dooks
@ 2013-07-22 16:33 ` Ben Dooks
  2013-07-24 17:34   ` Dave Martin
  2013-07-22 16:33 ` [PATCH 3/4] ARM: module: correctly relocate instructions in BE8 Ben Dooks
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

The trap handler needs to take into account the endian configuration of
the system when loading instructions. Use <asm/opcodes.h> to provide the
necessary conversion functions.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 arch/arm/kernel/traps.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1c08911..0a56e13 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -34,6 +34,7 @@
 #include <asm/unwind.h>
 #include <asm/tls.h>
 #include <asm/system_misc.h>
+#include <asm/opcodes.h>
 
 #include "signal.h"
 
@@ -411,23 +412,24 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
 	if (processor_mode(regs) == SVC_MODE) {
 #ifdef CONFIG_THUMB2_KERNEL
 		if (thumb_mode(regs)) {
-			instr = ((u16 *)pc)[0];
+			instr = __mem_to_opcode_thumb16(((u16 *)pc)[0]);
 			if (is_wide_instruction(instr)) {
 				instr <<= 16;
-				instr |= ((u16 *)pc)[1];
+				instr |= __mem_to_opcode_thumb16(((u16 *)pc)[1]);
 			}
 		} else
 #endif
-			instr = *(u32 *) pc;
+			instr = __mem_to_opcode_arm(*(u32 *) pc);
 	} else if (thumb_mode(regs)) {
 		if (get_user(instr, (u16 __user *)pc))
 			goto die_sig;
+		instr = __mem_to_opcode_thumb16(instr);
 		if (is_wide_instruction(instr)) {
 			unsigned int instr2;
 			if (get_user(instr2, (u16 __user *)pc+1))
 				goto die_sig;
 			instr <<= 16;
-			instr |= instr2;
+			instr |= __mem_to_opcode_thumb16(instr2);
 		}
 	} else if (get_user(instr, (u32 __user *)pc)) {
 		goto die_sig;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/4] ARM: module: correctly relocate instructions in BE8
  2013-07-22 16:33 updates for be8 patch series Ben Dooks
  2013-07-22 16:33 ` [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode Ben Dooks
  2013-07-22 16:33 ` [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order Ben Dooks
@ 2013-07-22 16:33 ` Ben Dooks
  2013-07-24 17:52   ` Dave Martin
  2013-07-22 16:33 ` [PATCH 4/4] ARM: set --be8 when linking modules Ben Dooks
  2013-07-22 16:49 ` updates for be8 patch series Ben Dooks
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

When in BE8 mode, our instructions are not in the same ordering as the
data, so use <asm/opcodes.h> to take this into account.

Note, also requires modules to be built --be8

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 arch/arm/kernel/module.c |   57 +++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 1e9be5d..7e13787 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -24,6 +24,7 @@
 #include <asm/sections.h>
 #include <asm/smp_plat.h>
 #include <asm/unwind.h>
+#include <asm/opcodes.h>
 
 #ifdef CONFIG_XIP_KERNEL
 /*
@@ -60,6 +61,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 		Elf32_Sym *sym;
 		const char *symname;
 		s32 offset;
+		u32 tmp;
 #ifdef CONFIG_THUMB2_KERNEL
 		u32 upper, lower, sign, j1, j2;
 #endif
@@ -95,7 +97,8 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 		case R_ARM_PC24:
 		case R_ARM_CALL:
 		case R_ARM_JUMP24:
-			offset = (*(u32 *)loc & 0x00ffffff) << 2;
+			offset = __mem_to_opcode_arm(*(u32 *)loc);
+			offset = (offset & 0x00ffffff) << 2;
 			if (offset & 0x02000000)
 				offset -= 0x04000000;
 
@@ -111,9 +114,10 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			}
 
 			offset >>= 2;
+			offset &= 0x00ffffff;
 
-			*(u32 *)loc &= 0xff000000;
-			*(u32 *)loc |= offset & 0x00ffffff;
+			*(u32 *)loc &= __opcode_to_mem_arm(0xff000000);
+			*(u32 *)loc |= __opcode_to_mem_arm(offset);
 			break;
 
 	       case R_ARM_V4BX:
@@ -121,8 +125,8 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			* other bits to re-code instruction as
 			* MOV PC,Rm.
 			*/
-		       *(u32 *)loc &= 0xf000000f;
-		       *(u32 *)loc |= 0x01a0f000;
+		       *(u32 *)loc &= __opcode_to_mem_arm(0xf000000f);
+		       *(u32 *)loc |= __opcode_to_mem_arm(0x01a0f000);
 		       break;
 
 		case R_ARM_PREL31:
@@ -132,7 +136,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 
 		case R_ARM_MOVW_ABS_NC:
 		case R_ARM_MOVT_ABS:
-			offset = *(u32 *)loc;
+			offset = tmp = __mem_to_opcode_arm(*(u32 *)loc);
 			offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff);
 			offset = (offset ^ 0x8000) - 0x8000;
 
@@ -140,16 +144,18 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
 				offset >>= 16;
 
-			*(u32 *)loc &= 0xfff0f000;
-			*(u32 *)loc |= ((offset & 0xf000) << 4) |
-					(offset & 0x0fff);
+			tmp &= 0xfff0f000;
+			tmp |= ((offset & 0xf000) << 4) |
+				(offset & 0x0fff);
+
+			*(u32 *)loc = __opcode_to_mem_arm(tmp);
 			break;
 
 #ifdef CONFIG_THUMB2_KERNEL
 		case R_ARM_THM_CALL:
 		case R_ARM_THM_JUMP24:
-			upper = *(u16 *)loc;
-			lower = *(u16 *)(loc + 2);
+			upper = __mem_to_opcode_thumb16(*(u16 *)loc);
+			lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2));
 
 			/*
 			 * 25 bit signed address range (Thumb-2 BL and B.W
@@ -198,17 +204,20 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			sign = (offset >> 24) & 1;
 			j1 = sign ^ (~(offset >> 23) & 1);
 			j2 = sign ^ (~(offset >> 22) & 1);
-			*(u16 *)loc = (u16)((upper & 0xf800) | (sign << 10) |
+			upper = (u16)((upper & 0xf800) | (sign << 10) |
 					    ((offset >> 12) & 0x03ff));
-			*(u16 *)(loc + 2) = (u16)((lower & 0xd000) |
-						  (j1 << 13) | (j2 << 11) |
-						  ((offset >> 1) & 0x07ff));
+			lower = (u16)((lower & 0xd000) |
+				      (j1 << 13) | (j2 << 11) |
+				      ((offset >> 1) & 0x07ff));
+
+			*(u16 *)loc = __opcode_to_mem_thumb16(upper);
+			*(u16 *)(loc + 2) = __opcode_to_mem_thumb16(lower);
 			break;
 
 		case R_ARM_THM_MOVW_ABS_NC:
 		case R_ARM_THM_MOVT_ABS:
-			upper = *(u16 *)loc;
-			lower = *(u16 *)(loc + 2);
+			upper = __mem_to_opcode_thumb16(*(u16 *)loc);
+			lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2));
 
 			/*
 			 * MOVT/MOVW instructions encoding in Thumb-2:
@@ -229,12 +238,14 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
 				offset >>= 16;
 
-			*(u16 *)loc = (u16)((upper & 0xfbf0) |
-					    ((offset & 0xf000) >> 12) |
-					    ((offset & 0x0800) >> 1));
-			*(u16 *)(loc + 2) = (u16)((lower & 0x8f00) |
-						  ((offset & 0x0700) << 4) |
-						  (offset & 0x00ff));
+			upper = (u16)((upper & 0xfbf0) |
+				      ((offset & 0xf000) >> 12) |
+				      ((offset & 0x0800) >> 1));
+			lower = (u16)((lower & 0x8f00) |
+				      ((offset & 0x0700) << 4) |
+				      (offset & 0x00ff));
+			*(u16 *)loc = __opcode_to_mem_thumb16(upper);
+			*(u16 *)(loc + 2) = __opcode_to_mem_thumb16(lower);
 			break;
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 16:33 updates for be8 patch series Ben Dooks
                   ` (2 preceding siblings ...)
  2013-07-22 16:33 ` [PATCH 3/4] ARM: module: correctly relocate instructions in BE8 Ben Dooks
@ 2013-07-22 16:33 ` Ben Dooks
  2013-07-22 17:05   ` Stephen Boyd
  2013-07-22 16:49 ` updates for be8 patch series Ben Dooks
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

To avoid having to make every text section swap the instruction order
of all instructions, make sure modules are built also built with --be8
(as is the current kernel final link).

If we do not do this, we would end up having to swap all instructions
when loading a module, instead of just the instructions that we are
applying ELF relocations to.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 arch/arm/Makefile |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index ee4605f..8173883 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -16,6 +16,7 @@ LDFLAGS		:=
 LDFLAGS_vmlinux	:=-p --no-undefined -X
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux	+= --be8
+LDFLAGS_MODULE	+= --be8
 endif
 
 OBJCOPYFLAGS	:=-O binary -R .comment -S
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-22 16:33 updates for be8 patch series Ben Dooks
                   ` (3 preceding siblings ...)
  2013-07-22 16:33 ` [PATCH 4/4] ARM: set --be8 when linking modules Ben Dooks
@ 2013-07-22 16:49 ` Ben Dooks
  4 siblings, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 17:33, Ben Dooks wrote:
> I've updated the BE8 patch series after running further tests on the
> kernel. I found an issue with the alignment handler (and I believe the
> same issue with the trap handler).
>
> I also ran tests with the module loader, which shows that if we are
> in BE8 that it needs to fix up the relocations. Also the modules need
> to be built --be8 otherwise they cannot be easily relocated when loaded.
>
> Should patches 3 and 4 be together (ie, alter the LDFLAGS and the
> module.c) and have I got the correct LDLAGS?
>
> Comments and testing welcome. I will push out a new series later with
> these in and possibly rebased on 3.11-rc2.

New series based on 3.10 at:

	git://git.baserock.org/delta/linux.git
		baserock/311/be/core-v3
		baserock/311/be/atags-v3

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 16:33 ` [PATCH 4/4] ARM: set --be8 when linking modules Ben Dooks
@ 2013-07-22 17:05   ` Stephen Boyd
  2013-07-22 17:20     ` Ben Dooks
  0 siblings, 1 reply; 29+ messages in thread
From: Stephen Boyd @ 2013-07-22 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/22, Ben Dooks wrote:
> To avoid having to make every text section swap the instruction order
> of all instructions, make sure modules are built also built with --be8
> (as is the current kernel final link).
> 
> If we do not do this, we would end up having to swap all instructions
> when loading a module, instead of just the instructions that we are
> applying ELF relocations to.
> 

If someone tries to load a be8 module on a non-be8 kernel will it
still work? Or should we add an extra version magic string in
asm/module.h to prevent that?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 17:05   ` Stephen Boyd
@ 2013-07-22 17:20     ` Ben Dooks
  2013-07-22 18:53       ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 18:05, Stephen Boyd wrote:
> On 07/22, Ben Dooks wrote:
>> To avoid having to make every text section swap the instruction order
>> of all instructions, make sure modules are built also built with --be8
>> (as is the current kernel final link).
>>
>> If we do not do this, we would end up having to swap all instructions
>> when loading a module, instead of just the instructions that we are
>> applying ELF relocations to.
>>
>
> If someone tries to load a be8 module on a non-be8 kernel will it
> still work? Or should we add an extra version magic string in
> asm/module.h to prevent that?

The ELF header changes the EI_DATA field in the ei_ident from
ELFDATA2LSB to ELFDATA2MSB when compiling so we should be able
to detect these when loading.

I have not checked to see if the kernel correctly checks for this.

I do not think it currently checks the ei_flags field for the
EF_ARM_BE8 in ABI 4 and 5. I am not sure if this is really important?

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 17:20     ` Ben Dooks
@ 2013-07-22 18:53       ` Nicolas Pitre
  2013-07-22 18:56         ` Ben Dooks
  2013-07-22 20:26         ` Ben Dooks
  0 siblings, 2 replies; 29+ messages in thread
From: Nicolas Pitre @ 2013-07-22 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 22 Jul 2013, Ben Dooks wrote:

> On 22/07/13 18:05, Stephen Boyd wrote:
> > On 07/22, Ben Dooks wrote:
> > > To avoid having to make every text section swap the instruction order
> > > of all instructions, make sure modules are built also built with --be8
> > > (as is the current kernel final link).
> > > 
> > > If we do not do this, we would end up having to swap all instructions
> > > when loading a module, instead of just the instructions that we are
> > > applying ELF relocations to.
> > > 
> > 
> > If someone tries to load a be8 module on a non-be8 kernel will it
> > still work? Or should we add an extra version magic string in
> > asm/module.h to prevent that?
> 
> The ELF header changes the EI_DATA field in the ei_ident from
> ELFDATA2LSB to ELFDATA2MSB when compiling so we should be able
> to detect these when loading.
> 
> I have not checked to see if the kernel correctly checks for this.
> 
> I do not think it currently checks the ei_flags field for the
> EF_ARM_BE8 in ABI 4 and 5. I am not sure if this is really important?

If the information is already there and easily accessible, then it 
should be used.


Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 18:53       ` Nicolas Pitre
@ 2013-07-22 18:56         ` Ben Dooks
  2013-07-23  8:16           ` Ben Dooks
  2013-07-22 20:26         ` Ben Dooks
  1 sibling, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 19:53, Nicolas Pitre wrote:
> On Mon, 22 Jul 2013, Ben Dooks wrote:
>
>> On 22/07/13 18:05, Stephen Boyd wrote:
>>> On 07/22, Ben Dooks wrote:
>>>> To avoid having to make every text section swap the instruction order
>>>> of all instructions, make sure modules are built also built with --be8
>>>> (as is the current kernel final link).
>>>>
>>>> If we do not do this, we would end up having to swap all instructions
>>>> when loading a module, instead of just the instructions that we are
>>>> applying ELF relocations to.
>>>>
>>>
>>> If someone tries to load a be8 module on a non-be8 kernel will it
>>> still work? Or should we add an extra version magic string in
>>> asm/module.h to prevent that?
>>
>> The ELF header changes the EI_DATA field in the ei_ident from
>> ELFDATA2LSB to ELFDATA2MSB when compiling so we should be able
>> to detect these when loading.
>>
>> I have not checked to see if the kernel correctly checks for this.
>>
>> I do not think it currently checks the ei_flags field for the
>> EF_ARM_BE8 in ABI 4 and 5. I am not sure if this is really important?
>
> If the information is already there and easily accessible, then it
> should be used.
>
>
> Nicolas

I have something like this, it does not mess things up loading
BE8 binaries on my current system. I think we're building stuff
for EABI4 but haven't checked.

diff --git a/arch/arm/kernel/elf.c b/arch/arm/kernel/elf.c
index d0d1e83..3b0351c 100644
--- a/arch/arm/kernel/elf.c
+++ b/arch/arm/kernel/elf.c
@@ -34,6 +34,15 @@ int elf_check_arch(const struct elf32_hdr *x)
                 if (flt_fmt == EF_ARM_VFP_FLOAT && !(elf_hwcap & 
HWCAP_VFP))
                         return 0;
         }
+
+       if ((eflags & EF_ARM_EABI_MASK) >= EF_ARM_EABI_VER4) {
+               if (eflags & EF_ARM_BE8) {
+                       if (!IS_ENABLED(CONFIG_ARM_CPU_BE8))
+                               return 1;
+               } else if (IS_ENABLED(CONFIG_ARM_CPU_BE8))
+                       return 1;
+       }
+

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 18:53       ` Nicolas Pitre
  2013-07-22 18:56         ` Ben Dooks
@ 2013-07-22 20:26         ` Ben Dooks
  1 sibling, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-22 20:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 19:53, Nicolas Pitre wrote:
> On Mon, 22 Jul 2013, Ben Dooks wrote:
>
>> On 22/07/13 18:05, Stephen Boyd wrote:
>>> On 07/22, Ben Dooks wrote:
>>>> To avoid having to make every text section swap the instruction order
>>>> of all instructions, make sure modules are built also built with --be8
>>>> (as is the current kernel final link).
>>>>
>>>> If we do not do this, we would end up having to swap all instructions
>>>> when loading a module, instead of just the instructions that we are
>>>> applying ELF relocations to.
>>>>
>>>
>>> If someone tries to load a be8 module on a non-be8 kernel will it
>>> still work? Or should we add an extra version magic string in
>>> asm/module.h to prevent that?
>>
>> The ELF header changes the EI_DATA field in the ei_ident from
>> ELFDATA2LSB to ELFDATA2MSB when compiling so we should be able
>> to detect these when loading.
>>
>> I have not checked to see if the kernel correctly checks for this.
>>
>> I do not think it currently checks the ei_flags field for the
>> EF_ARM_BE8 in ABI 4 and 5. I am not sure if this is really important?
>
> If the information is already there and easily accessible, then it
> should be used.

I added this, which seems to actually work unlike my last effort.

I do not think it actually gets triggered, the ELF format contains
the endian-ness of the system it was built for and therefore fails
somewhere else.


diff --git a/arch/arm/kernel/elf.c b/arch/arm/kernel/elf.c
index d0d1e83..37c8e66 100644
--- a/arch/arm/kernel/elf.c
+++ b/arch/arm/kernel/elf.c
@@ -34,6 +34,17 @@ int elf_check_arch(const struct elf32_hdr *x)
                 if (flt_fmt == EF_ARM_VFP_FLOAT && !(elf_hwcap & 
HWCAP_VFP))
                         return 0;
         }
+
+       if ((eflags & EF_ARM_EABI_MASK) >= EF_ARM_EABI_VER4) {
+               bool is_be8 = IS_ENABLED(CONFIG_CPU_ENDIAN_BE8);
+
+               /* do some simple endian-ness verifications */
+               if (eflags & EF_ARM_BE8 && !is_be8)
+                       return 0;
+               if (eflags & EF_ARM_LE8 && is_be8)
+                       return 0;
+       }
+

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] ARM: set --be8 when linking modules
  2013-07-22 18:56         ` Ben Dooks
@ 2013-07-23  8:16           ` Ben Dooks
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-23  8:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 19:56, Ben Dooks wrote:
> On 22/07/13 19:53, Nicolas Pitre wrote:
>> On Mon, 22 Jul 2013, Ben Dooks wrote:
>>
>>> On 22/07/13 18:05, Stephen Boyd wrote:
>>>> On 07/22, Ben Dooks wrote:
>>>>> To avoid having to make every text section swap the instruction order
>>>>> of all instructions, make sure modules are built also built with --be8
>>>>> (as is the current kernel final link).
>>>>>
>>>>> If we do not do this, we would end up having to swap all instructions
>>>>> when loading a module, instead of just the instructions that we are
>>>>> applying ELF relocations to.
>>>>>
>>>>
>>>> If someone tries to load a be8 module on a non-be8 kernel will it
>>>> still work? Or should we add an extra version magic string in
>>>> asm/module.h to prevent that?
>>>
>>> The ELF header changes the EI_DATA field in the ei_ident from
>>> ELFDATA2LSB to ELFDATA2MSB when compiling so we should be able
>>> to detect these when loading.
>>>
>>> I have not checked to see if the kernel correctly checks for this.
>>>
>>> I do not think it currently checks the ei_flags field for the
>>> EF_ARM_BE8 in ABI 4 and 5. I am not sure if this is really important?
>>
>> If the information is already there and easily accessible, then it
>> should be used.
>>
>>
>> Nicolas
>
> I have something like this, it does not mess things up loading
> BE8 binaries on my current system. I think we're building stuff
> for EABI4 but haven't checked.
>
> diff --git a/arch/arm/kernel/elf.c b/arch/arm/kernel/elf.c
> index d0d1e83..3b0351c 100644
> --- a/arch/arm/kernel/elf.c
> +++ b/arch/arm/kernel/elf.c
> @@ -34,6 +34,15 @@ int elf_check_arch(const struct elf32_hdr *x)
> if (flt_fmt == EF_ARM_VFP_FLOAT && !(elf_hwcap & HWCAP_VFP))
> return 0;
> }
> +
> + if ((eflags & EF_ARM_EABI_MASK) >= EF_ARM_EABI_VER4) {
> + if (eflags & EF_ARM_BE8) {
> + if (!IS_ENABLED(CONFIG_ARM_CPU_BE8))
> + return 1;
> + } else if (IS_ENABLED(CONFIG_ARM_CPU_BE8))
> + return 1;
> + }

dis-regard this, it does not work. I blame the heat.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode.
  2013-07-22 16:33 ` [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode Ben Dooks
@ 2013-07-24 17:16   ` Dave Martin
  2013-07-25 11:48     ` Ben Dooks
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Martin @ 2013-07-24 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 24, 2013 at 11:27:59AM +0100, Ben Dooks wrote:
> If we are in BE8 mode, we must deal with the instruction stream being
> in LE order when data is being loaded in BE order. Ensure the data is
> swapped before processing to avoid thre following:
> 
> Change to using <asm/opcodes.h> to provide the necessary conversion
> functions to change the byte ordering.
> 
> Alignment trap: not handling instruction 030091e8 at [<80333e8c>]
> Unhandled fault: alignment exception (0x001) at 0xbfa09567
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  arch/arm/mm/alignment.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
> index 6f4585b..f38145a 100644
> --- a/arch/arm/mm/alignment.c
> +++ b/arch/arm/mm/alignment.c
> @@ -25,6 +25,7 @@
>  #include <asm/cp15.h>
>  #include <asm/system_info.h>
>  #include <asm/unaligned.h>
> +#include <asm/opcodes.h>
>  
>  #include "fault.h"
>  
> @@ -762,21 +763,24 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  	if (thumb_mode(regs)) {
>  		u16 *ptr = (u16 *)(instrptr & ~1);
>  		fault = probe_kernel_address(ptr, tinstr);
> +		tinstr = __mem_to_opcode_thumb16(tinstr);
>  		if (!fault) {
>  			if (cpu_architecture() >= CPU_ARCH_ARMv7 &&
>  			    IS_T32(tinstr)) {

If patching this file anyway, we could take the opportunity to retire
the local IS_T32() macro and use !__opcode_is_thumb16() instead.

>  				/* Thumb-2 32-bit */
>  				u16 tinst2 = 0;
>  				fault = probe_kernel_address(ptr + 1, tinst2);
> +				tinst2 = __mem_to_opcode_thumb16(tinst2);
>  				instr = (tinstr << 16) | tinst2;

Similarly, this could be __opcode_thumb32_compose(tinstr, tinst2).

>  				thumb2_32b = 1;
>  			} else {
>  				isize = 2;
> -				instr = thumb2arm(tinstr);

eh?  Is this is mis-edit?

Cheers
---Dave

>  			}
>  		}
> -	} else
> +	} else {
>  		fault = probe_kernel_address(instrptr, instr);
> +		instr = __mem_to_opcode_arm(instr);
> +	}
>  
>  	if (fault) {
>  		type = TYPE_FAULT;
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order
  2013-07-22 16:33 ` [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order Ben Dooks
@ 2013-07-24 17:34   ` Dave Martin
  2013-07-25 11:17     ` Ben Dooks
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Martin @ 2013-07-24 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 22, 2013 at 05:33:51PM +0100, Ben Dooks wrote:
> The trap handler needs to take into account the endian configuration of
> the system when loading instructions. Use <asm/opcodes.h> to provide the
> necessary conversion functions.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  arch/arm/kernel/traps.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 1c08911..0a56e13 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -34,6 +34,7 @@
>  #include <asm/unwind.h>
>  #include <asm/tls.h>
>  #include <asm/system_misc.h>
> +#include <asm/opcodes.h>
>  
>  #include "signal.h"

Looks like is_valid_bugaddr() might need fixing too?

That will need __mem_to_opcode_thumb16() or __mem_to_opcode_arm()
depending on CONFIG_THUMB2_KERNEL.

>  
> @@ -411,23 +412,24 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
>  	if (processor_mode(regs) == SVC_MODE) {
>  #ifdef CONFIG_THUMB2_KERNEL
>  		if (thumb_mode(regs)) {
> -			instr = ((u16 *)pc)[0];
> +			instr = __mem_to_opcode_thumb16(((u16 *)pc)[0]);
>  			if (is_wide_instruction(instr)) {
>  				instr <<= 16;
> -				instr |= ((u16 *)pc)[1];
> +				instr |= __mem_to_opcode_thumb16(((u16 *)pc)[1]);

Could use __opcode_thumb32_compose(), in which case it might look like

			if (is_wide_instruction(instr)) {
				unsigned int instr2 = __mem_to_opcode_thumb16(
					((u16 *)pc)[1]);
				instr = __opcode_thumb32_compose(instr, instr2);

(Hmmm, I just noticed that is_wide_instruction and __opcode_is_thumb16()
kinda duplicate each other.  Could change that too, but it would look
more like pointless churn...  is_wide_instruction() is used in a fair
few places already.)

>  			}
>  		} else
>  #endif
> -			instr = *(u32 *) pc;
> +			instr = __mem_to_opcode_arm(*(u32 *) pc);
>  	} else if (thumb_mode(regs)) {
>  		if (get_user(instr, (u16 __user *)pc))
>  			goto die_sig;
> +		instr = __mem_to_opcode_thumb16(instr);
>  		if (is_wide_instruction(instr)) {
>  			unsigned int instr2;
>  			if (get_user(instr2, (u16 __user *)pc+1))
>  				goto die_sig;
>  			instr <<= 16;
> -			instr |= instr2;
> +			instr |= __mem_to_opcode_thumb16(instr2);
>  		}
>  	} else if (get_user(instr, (u32 __user *)pc)) {

This also needs mem-to-opcode'ing.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/4] ARM: module: correctly relocate instructions in BE8
  2013-07-22 16:33 ` [PATCH 3/4] ARM: module: correctly relocate instructions in BE8 Ben Dooks
@ 2013-07-24 17:52   ` Dave Martin
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Martin @ 2013-07-24 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 22, 2013 at 05:33:52PM +0100, Ben Dooks wrote:
> When in BE8 mode, our instructions are not in the same ordering as the
> data, so use <asm/opcodes.h> to take this into account.
> 
> Note, also requires modules to be built --be8
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  arch/arm/kernel/module.c |   57 +++++++++++++++++++++++++++-------------------
>  1 file changed, 34 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 1e9be5d..7e13787 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -24,6 +24,7 @@
>  #include <asm/sections.h>
>  #include <asm/smp_plat.h>
>  #include <asm/unwind.h>
> +#include <asm/opcodes.h>
>  
>  #ifdef CONFIG_XIP_KERNEL
>  /*
> @@ -60,6 +61,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  		Elf32_Sym *sym;
>  		const char *symname;
>  		s32 offset;
> +		u32 tmp;
>  #ifdef CONFIG_THUMB2_KERNEL
>  		u32 upper, lower, sign, j1, j2;
>  #endif
> @@ -95,7 +97,8 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  		case R_ARM_PC24:
>  		case R_ARM_CALL:
>  		case R_ARM_JUMP24:
> -			offset = (*(u32 *)loc & 0x00ffffff) << 2;
> +			offset = __mem_to_opcode_arm(*(u32 *)loc);
> +			offset = (offset & 0x00ffffff) << 2;
>  			if (offset & 0x02000000)
>  				offset -= 0x04000000;
>  
> @@ -111,9 +114,10 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  			}
>  
>  			offset >>= 2;
> +			offset &= 0x00ffffff;
>  
> -			*(u32 *)loc &= 0xff000000;
> -			*(u32 *)loc |= offset & 0x00ffffff;
> +			*(u32 *)loc &= __opcode_to_mem_arm(0xff000000);
> +			*(u32 *)loc |= __opcode_to_mem_arm(offset);

Here it's assumed that

__opcode_to_mem_arm(x | y) == __opcode_to_mem_arm(x) | __opcode_to_mem_arm(x)

(and related rules)

Since the macros really do just shuffle bits, that does work, but it may
be a bit cleaner to build on the instruction in canonical representation
and use __opcode_to_mem_arm() just once to avoid such subtleties.

Then we can simply treat instructions in memory as opaque data, which all
flows into the code through __mem_to_opcode*() and back out through
__opcode_to_mem*().

(analogous to ntohs() ... do something useful ... htons() etc.)


So:
			tmp = __mem_to_opcode_arm(*(u32 *)loc);
			tmp = (tmp & 0xff000000) | offset;
			*(u32 *)loc = __opcode_to_mem_arm(tmp);

I think that's cleaner, but it's open to debate.  GCC might generate
slightly inferior code, depending on how clever it is ... but this
isn't a fast path.


The same style could be applied to all the relocations here.  It
could help to avoid confusion about which byte order a particular
vairable or literal uses -- i.e., ARM ARM order is used for
everything except the contents of memory.


I have to abort this review now ... I'll try to look at the rest
tomorrow.

Cheers
---Dave

>  			break;
>  
>  	       case R_ARM_V4BX:
> @@ -121,8 +125,8 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  			* other bits to re-code instruction as
>  			* MOV PC,Rm.
>  			*/
> -		       *(u32 *)loc &= 0xf000000f;
> -		       *(u32 *)loc |= 0x01a0f000;
> +		       *(u32 *)loc &= __opcode_to_mem_arm(0xf000000f);
> +		       *(u32 *)loc |= __opcode_to_mem_arm(0x01a0f000);
>  		       break;
>  
>  		case R_ARM_PREL31:
> @@ -132,7 +136,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  
>  		case R_ARM_MOVW_ABS_NC:
>  		case R_ARM_MOVT_ABS:
> -			offset = *(u32 *)loc;
> +			offset = tmp = __mem_to_opcode_arm(*(u32 *)loc);
>  			offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff);
>  			offset = (offset ^ 0x8000) - 0x8000;
>  
> @@ -140,16 +144,18 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  			if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
>  				offset >>= 16;
>  
> -			*(u32 *)loc &= 0xfff0f000;
> -			*(u32 *)loc |= ((offset & 0xf000) << 4) |
> -					(offset & 0x0fff);
> +			tmp &= 0xfff0f000;
> +			tmp |= ((offset & 0xf000) << 4) |
> +				(offset & 0x0fff);
> +
> +			*(u32 *)loc = __opcode_to_mem_arm(tmp);
>  			break;
>  
>  #ifdef CONFIG_THUMB2_KERNEL
>  		case R_ARM_THM_CALL:
>  		case R_ARM_THM_JUMP24:
> -			upper = *(u16 *)loc;
> -			lower = *(u16 *)(loc + 2);
> +			upper = __mem_to_opcode_thumb16(*(u16 *)loc);
> +			lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2));
>  
>  			/*
>  			 * 25 bit signed address range (Thumb-2 BL and B.W
> @@ -198,17 +204,20 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  			sign = (offset >> 24) & 1;
>  			j1 = sign ^ (~(offset >> 23) & 1);
>  			j2 = sign ^ (~(offset >> 22) & 1);
> -			*(u16 *)loc = (u16)((upper & 0xf800) | (sign << 10) |
> +			upper = (u16)((upper & 0xf800) | (sign << 10) |
>  					    ((offset >> 12) & 0x03ff));
> -			*(u16 *)(loc + 2) = (u16)((lower & 0xd000) |
> -						  (j1 << 13) | (j2 << 11) |
> -						  ((offset >> 1) & 0x07ff));
> +			lower = (u16)((lower & 0xd000) |
> +				      (j1 << 13) | (j2 << 11) |
> +				      ((offset >> 1) & 0x07ff));
> +
> +			*(u16 *)loc = __opcode_to_mem_thumb16(upper);
> +			*(u16 *)(loc + 2) = __opcode_to_mem_thumb16(lower);
>  			break;
>  
>  		case R_ARM_THM_MOVW_ABS_NC:
>  		case R_ARM_THM_MOVT_ABS:
> -			upper = *(u16 *)loc;
> -			lower = *(u16 *)(loc + 2);
> +			upper = __mem_to_opcode_thumb16(*(u16 *)loc);
> +			lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2));
>  
>  			/*
>  			 * MOVT/MOVW instructions encoding in Thumb-2:
> @@ -229,12 +238,14 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>  			if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
>  				offset >>= 16;
>  
> -			*(u16 *)loc = (u16)((upper & 0xfbf0) |
> -					    ((offset & 0xf000) >> 12) |
> -					    ((offset & 0x0800) >> 1));
> -			*(u16 *)(loc + 2) = (u16)((lower & 0x8f00) |
> -						  ((offset & 0x0700) << 4) |
> -						  (offset & 0x00ff));
> +			upper = (u16)((upper & 0xfbf0) |
> +				      ((offset & 0xf000) >> 12) |
> +				      ((offset & 0x0800) >> 1));
> +			lower = (u16)((lower & 0x8f00) |
> +				      ((offset & 0x0700) << 4) |
> +				      (offset & 0x00ff));
> +			*(u16 *)loc = __opcode_to_mem_thumb16(upper);
> +			*(u16 *)(loc + 2) = __opcode_to_mem_thumb16(lower);
>  			break;
>  #endif
>  
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order
  2013-07-24 17:34   ` Dave Martin
@ 2013-07-25 11:17     ` Ben Dooks
  2013-07-26 16:04       ` Dave Martin
  0 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-25 11:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/07/13 18:34, Dave Martin wrote:
> On Mon, Jul 22, 2013 at 05:33:51PM +0100, Ben Dooks wrote:
>> The trap handler needs to take into account the endian configuration of
>> the system when loading instructions. Use<asm/opcodes.h>  to provide the
>> necessary conversion functions.
>>
>> Signed-off-by: Ben Dooks<ben.dooks@codethink.co.uk>
>> ---
>>   arch/arm/kernel/traps.c |   10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
>> index 1c08911..0a56e13 100644
>> --- a/arch/arm/kernel/traps.c
>> +++ b/arch/arm/kernel/traps.c
>> @@ -34,6 +34,7 @@
>>   #include<asm/unwind.h>
>>   #include<asm/tls.h>
>>   #include<asm/system_misc.h>
>> +#include<asm/opcodes.h>
>>
>>   #include "signal.h"
>
> Looks like is_valid_bugaddr() might need fixing too?
>
> That will need __mem_to_opcode_thumb16() or __mem_to_opcode_arm()
> depending on CONFIG_THUMB2_KERNEL.

I was looking at that, and was considering if this is the right
thing to do or to change the value of BUG_INSTR_VALUE depending
on this.

It is possible the linker may try and change the ordering of the
BUG() instruction as it is being placed in the output with either
a .hword or .word value. I will go and check this as soon as possible.

>> @@ -411,23 +412,24 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
>>   	if (processor_mode(regs) == SVC_MODE) {
>>   #ifdef CONFIG_THUMB2_KERNEL
>>   		if (thumb_mode(regs)) {
>> -			instr = ((u16 *)pc)[0];
>> +			instr = __mem_to_opcode_thumb16(((u16 *)pc)[0]);
>>   			if (is_wide_instruction(instr)) {
>>   				instr<<= 16;
>> -				instr |= ((u16 *)pc)[1];
>> +				instr |= __mem_to_opcode_thumb16(((u16 *)pc)[1]);
>
> Could use __opcode_thumb32_compose(), in which case it might look like
>
> 			if (is_wide_instruction(instr)) {
> 				unsigned int instr2 = __mem_to_opcode_thumb16(
> 					((u16 *)pc)[1]);
> 				instr = __opcode_thumb32_compose(instr, instr2);
>
> (Hmmm, I just noticed that is_wide_instruction and __opcode_is_thumb16()
> kinda duplicate each other.  Could change that too, but it would look
> more like pointless churn...  is_wide_instruction() is used in a fair
> few places already.)

It is possible that it could be changed, but I will avoid that for this
particular patch

>>   			}
>>   		} else
>>   #endif
>> -			instr = *(u32 *) pc;
>> +			instr = __mem_to_opcode_arm(*(u32 *) pc);
>>   	} else if (thumb_mode(regs)) {
>>   		if (get_user(instr, (u16 __user *)pc))
>>   			goto die_sig;
>> +		instr = __mem_to_opcode_thumb16(instr);
>>   		if (is_wide_instruction(instr)) {
>>   			unsigned int instr2;
>>   			if (get_user(instr2, (u16 __user *)pc+1))
>>   				goto die_sig;
>>   			instr<<= 16;
>> -			instr |= instr2;
>> +			instr |= __mem_to_opcode_thumb16(instr2);
>>   		}
>>   	} else if (get_user(instr, (u32 __user *)pc)) {
>
> This also needs mem-to-opcode'ing.

Thanks, will check.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode.
  2013-07-24 17:16   ` Dave Martin
@ 2013-07-25 11:48     ` Ben Dooks
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-25 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/07/13 18:16, Dave Martin wrote:
> instr = thumb2arm(tinstr);

oops, removed it by accident. thanks for spotting.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order
  2013-07-25 11:17     ` Ben Dooks
@ 2013-07-26 16:04       ` Dave Martin
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Martin @ 2013-07-26 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 25, 2013 at 12:17:59PM +0100, Ben Dooks wrote:
> On 24/07/13 18:34, Dave Martin wrote:
> >On Mon, Jul 22, 2013 at 05:33:51PM +0100, Ben Dooks wrote:
> >>The trap handler needs to take into account the endian configuration of
> >>the system when loading instructions. Use<asm/opcodes.h>  to provide the
> >>necessary conversion functions.
> >>
> >>Signed-off-by: Ben Dooks<ben.dooks@codethink.co.uk>
> >>---
> >>  arch/arm/kernel/traps.c |   10 ++++++----
> >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> >>index 1c08911..0a56e13 100644
> >>--- a/arch/arm/kernel/traps.c
> >>+++ b/arch/arm/kernel/traps.c
> >>@@ -34,6 +34,7 @@
> >>  #include<asm/unwind.h>
> >>  #include<asm/tls.h>
> >>  #include<asm/system_misc.h>
> >>+#include<asm/opcodes.h>
> >>
> >>  #include "signal.h"
> >
> >Looks like is_valid_bugaddr() might need fixing too?
> >
> >That will need __mem_to_opcode_thumb16() or __mem_to_opcode_arm()
> >depending on CONFIG_THUMB2_KERNEL.
> 
> I was looking at that, and was considering if this is the right
> thing to do or to change the value of BUG_INSTR_VALUE depending
> on this.
> 
> It is possible the linker may try and change the ordering of the
> BUG() instruction as it is being placed in the output with either
> a .hword or .word value. I will go and check this as soon as possible.

I guess the way would be to leave BUG_INSTR_VALUE as-is, and
inject it into the code using the appropriate __inst (similarly to the
KGDB case).

> 
> >>@@ -411,23 +412,24 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
> >>  	if (processor_mode(regs) == SVC_MODE) {
> >>  #ifdef CONFIG_THUMB2_KERNEL
> >>  		if (thumb_mode(regs)) {
> >>-			instr = ((u16 *)pc)[0];
> >>+			instr = __mem_to_opcode_thumb16(((u16 *)pc)[0]);
> >>  			if (is_wide_instruction(instr)) {
> >>  				instr<<= 16;
> >>-				instr |= ((u16 *)pc)[1];
> >>+				instr |= __mem_to_opcode_thumb16(((u16 *)pc)[1]);
> >
> >Could use __opcode_thumb32_compose(), in which case it might look like
> >
> >			if (is_wide_instruction(instr)) {
> >				unsigned int instr2 = __mem_to_opcode_thumb16(
> >					((u16 *)pc)[1]);
> >				instr = __opcode_thumb32_compose(instr, instr2);
> >
> >(Hmmm, I just noticed that is_wide_instruction and __opcode_is_thumb16()
> >kinda duplicate each other.  Could change that too, but it would look
> >more like pointless churn...  is_wide_instruction() is used in a fair
> >few places already.)
> 
> It is possible that it could be changed, but I will avoid that for this
> particular patch

No problem, that was just cosmetic.

> 
> >>  			}
> >>  		} else
> >>  #endif
> >>-			instr = *(u32 *) pc;
> >>+			instr = __mem_to_opcode_arm(*(u32 *) pc);
> >>  	} else if (thumb_mode(regs)) {
> >>  		if (get_user(instr, (u16 __user *)pc))
> >>  			goto die_sig;
> >>+		instr = __mem_to_opcode_thumb16(instr);
> >>  		if (is_wide_instruction(instr)) {
> >>  			unsigned int instr2;
> >>  			if (get_user(instr2, (u16 __user *)pc+1))
> >>  				goto die_sig;
> >>  			instr<<= 16;
> >>-			instr |= instr2;
> >>+			instr |= __mem_to_opcode_thumb16(instr2);
> >>  		}
> >>  	} else if (get_user(instr, (u32 __user *)pc)) {
> >
> >This also needs mem-to-opcode'ing.
> 
> Thanks, will check.

OK.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-24  7:31         ` Victor Kamensky
@ 2013-07-24 21:24           ` Victor Kamensky
  0 siblings, 0 replies; 29+ messages in thread
From: Victor Kamensky @ 2013-07-24 21:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

> I believe largely your series covers the same things, in more accurate way,
> and should be continued. I plan to review your series and will try to
> integrate it and run on Pandaboard ES. Note for this testing I would use big
> patch of TI OMAP4 drivers to be endian neutral. It will take me several days.
> If I find anything I will let you know accross these days.

Here is what I got so far:

setend be under ARM_BE8 or CPU_ENDIAN_BE8 usage
-----------------------------------------------

In many places I see 'setend be' instruction under ARM_BE8

ARM_BE8(    setend    be )

and ARM_BE8 defined as

+/* Select code for any configuration running in BE8 mode */
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define ARM_BE8(code...) code
+#else
+#define ARM_BE8(code...)
+#endif

but strictly speaking you need to do that only if CPU_BE8_BOOT_LE config set.
I.e if loader is not in little endian mode, you do not need those commands. CPU
will be already in BE mode.

IMHO ARM_BE8 should be used only if it is really be8 specific issue. There
could be just big endian issues, those should be done under CPU_BIG_ENDIAN
config. There could be ARM CPU_BIG_ENDIAN system, but not CPU_ENDIAN_BE8.
I.e ARCH_IXP4xx is example of that.

For example 'ARM: pl01x debug code endian fix' patch IMHO should
use CPU_BIG_ENDIAN rather than CONFIG_CPU_ENDIAN_BE8 (as it is now through
ARM_BE8). I.e if the same serial console output function will run on
CPU_BIG_ENDIAN but not CONFIG_CPU_ENDIAN_BE8 it would need such byteswap. I
understand that it is a bit contrived example where code is really BSP
specific and this BSP would never have CONFIG_CPU_ENDIAN_BE8 missing, but I am
concerned that meaning of configs got blured if it is used like this.

Note you do have BE8_LE defined in latter patch ...

Few places miss atags swaps on the read
---------------------------------------

I've tried to compare your atags reading changes vs my byteswap in place code. I
agree that byte swap on reading is better. It seems to me that you've missed
few places when conversion should happen (unless I am missing something). Also I
think you  need to add atag16_to_cpu, and cpu_to_atag16 so short atag values
could be handled.

1) arch/arm/kernel/atags_parse.c <global>
            87 __tagtable(ATAG_VIDEOTEXT, parse_tag_videotext);

It needs to swap (only relevant fields listed)

struct tag_videotext {
    __u16        video_page;
    __u16        video_ega_bx;
    __u16        video_points;
};


2) arch/arm/kernel/atags_parse.c <global>
            140 __tagtable(ATAG_CMDLINE, parse_tag_cmdline);

It needs to swap cmdline, as far as I read the code it is a pointer ...
wondering what is going on in 64bit case ...

struct tag_cmdline {
    char    cmdline[1];    /* this is the minimum size */
};

3) arch/arm/mach-footbridge/common.c <global>
            51 __tagtable(ATAG_MEMCLK, parse_tag_memclk);

For consistency sake it is good idea to convert this too. It needs to swap
fmemclk on the read

struct tag_memclk {
    __u32 fmemclk;
};

4) arch/arm/mach-rpc/riscpc.c <global>
            65 __tagtable(ATAG_ACORN, parse_tag_acorn);

For consistency sake it is good idea to convert this too. It needs to swap
when reading the following fields

struct tag_acorn {
    __u32 memc_control_reg;
    __u32 vram_pages;
};

5) arch/arm/mm/init.c <global>
            68 __tagtable(ATAG_INITRD, parse_tag_initrd);
6) arch/arm/mm/init.c <global>
            77 __tagtable(ATAG_INITRD2, parse_tag_initrd2);

Need to swap

struct tag_initrd {
    __u32 start;    /* physical start address */
    __u32 size;    /* size of compressed ramdisk image in bytes */
};

Other than these notes, it looks very very good!

Thanks,
Victor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  9:36           ` Will Deacon
@ 2013-07-24 10:46           ` Ben Dooks
  1 sibling, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-24 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/07/13 02:06, Victor Kamensky wrote:
> Hi Ben,
>
> Let me split off atomic64 from other issues, so we have focused topic here.
>
> Please see atomic64 issue test case below. Please try it in your setup. I've
> run on Pandaboard ES with my set of patches, but I believe it should be the
> same in your case.
>
> <snip>
>
>>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>>> ===================================================================
>>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>>
>>>        __asm__ __volatile__("@ atomic64_add\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %4\n"
>>>    "    adc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H4\n"
>>> +"    adc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>>
>>>        __asm__ __volatile__("@ atomic64_add_return\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %4\n"
>>>    "    adc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H4\n"
>>> +"    adc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>>
>>>        __asm__ __volatile__("@ atomic64_sub\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, %4\n"
>>>    "    sbc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    subs    %H0, %H0, %H4\n"
>>> +"    sbc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>>
>>>        __asm__ __volatile__("@ atomic64_sub_return\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, %4\n"
>>>    "    sbc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    subs    %H0, %H0, %H4\n"
>>> +"    sbc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>>
>>>        __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, #1\n"
>>>    "    sbc    %H0, %H0, #0\n"
>>>    "    teq    %H0, #0\n"
>>> +#else
>>> +"    subs    %H0, %H0, #1\n"
>>> +"    sbc    %0, %0, #0\n"
>>> +"    teq    %0, #0\n"
>>> +#endif
>>>    "    bmi    2f\n"
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>>    "    teqeq    %H0, %H5\n"
>>>    "    moveq    %1, #0\n"
>>>    "    beq    2f\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %6\n"
>>>    "    adc    %H0, %H0, %H6\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H6\n"
>>> +"    adc    %0, %0, %6\n"
>>> +#endif
>>>    "    strexd    %2, %0, %H0, [%4]\n"
>>>    "    teq    %2, #0\n"
>>>    "    bne    1b\n"
>>
>>
>> I still believe that you're doing the wrong thing here
>> and that these can be left well enough alone. Please give
>> a concrete piece of code that can show they're actually
>> doing the wrong thing.
>
> Disable CONFIG_GENERIC_ATOMIC64
> -------------------------------
>
> Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from
> lib/atomic64.c will be used and that one works correctly but this case does
> not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h
>
> Personally with my current kernel close to 3.10 I failed to do that and manage
> to get live kernel, so to illustrate the issue I will use my own copy of
> inline atomic64_add function. But effectively it will show the same problem
> that I tried to report.

the kconfig has:
select GENERIC_ATOMIC64 if (CPU_V7M || CPU_V6 || !CPU_32v6K || !AEABI)

which means that it is possible that the generic one is being used.


> Compiler used
> -------------
>
> gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux
>
> Test module source
> ------------------
>
> #include<linux/module.h>
> #include<linux/moduleparam.h>
> #include<linux/atomic.h>
>
> static inline void my_atomic64_add_original(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> "    adds    %0, %0, %4\n"
> "    adc    %H0, %H0, %H4\n"
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
> static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> #ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
> "    adds    %0, %0, %4\n"
> "    adc    %H0, %H0, %H4\n"
> #else
> "    adds    %H0, %H0, %H4\n"
> "    adc    %0, %0, %4\n"
> #endif
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
>
> atomic64_t a = {0};
>
> #define INCREMENT 0x40000000
>
> void atomic_update_original (void)
> {
>      my_atomic64_add_original(INCREMENT,&a);
> }
>
> void atomic_update_fixed (void)
> {
>      my_atomic64_add_fixed(INCREMENT,&a);
> }
>
> void test_atomic64_original(void)
> {
>      int i;
>
>      printk("original atomic64_add\n");
>
>      atomic64_set(&a, 0);
>
>      printk("a = %llx\n", a.counter);
>
>      for (i = 0; i<  10; i++) {
>          atomic_update_original();
>          printk("i = %2d: a =  %llx\n", i, a.counter);
>      }
> }
>
> void test_atomic64_fixed(void)
> {
>      int i;
>
>      printk("fixed atomic64_add\n");
>
>      atomic64_set(&a, 0);
>
>      printk("a = %llx\n", a.counter);
>
>      for (i = 0; i<  10; i++) {
>          atomic_update_fixed();
>          printk("i = %2d: a =  %llx\n", i, a.counter);
>      }
> }
>
> int
> init_module (void)
> {
>      test_atomic64_original();
>      test_atomic64_fixed();
>      return 0;
> }
>
> void
> cleanup_module(void)
> {
> }
>
> MODULE_LICENSE("GPL");
>
>
> Test run
> --------
>
> Compile and run module, observe in original code counter is not incremented
> correctly.
>
> root at genericarmv7ab:~# insmod atomic64.ko
> [   73.041107] original atomic64_add
> [   73.044647] a = 0
> [   73.046691] i =  0: a =  40000000
> [   73.050659] i =  1: a =  80000000
> [   73.054199] i =  2: a =  c0000000
> [   73.057983] i =  3: a =  0
> [   73.060852] i =  4: a =  40000000
> [   73.064361] i =  5: a =  80000000
> [   73.067932] i =  6: a =  c0000000
> [   73.071441] i =  7: a =  0
> [   73.074310] i =  8: a =  40000000
> [   73.077850] i =  9: a =  80000000
> [   73.081390] fixed atomic64_add
> [   73.084625] a = 0
> [   73.086669] i =  0: a =  40000000
> [   73.090209] i =  1: a =  80000000
> [   73.093749] i =  2: a =  c0000000
> [   73.097290] i =  3: a =  100000000
> [   73.100891] i =  4: a =  140000000
> [   73.104522] i =  5: a =  180000000
> [   73.108154] i =  6: a =  1c0000000
> [   73.111755] i =  7: a =  200000000
> [   73.115386] i =  8: a =  240000000
> [   73.119018] i =  9: a =  280000000
>
>
> Disassemble of original code
> ----------------------------
>
> (gdb) disassemble atomic_update_original
> Dump of assembler code for function atomic_update_original:
>     0x00000000<+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
>     0x00000004<+4>:    mov    r3, #1073741824    ; 0x40000000
>     0x00000008<+8>:    ldr    r12, [pc, #32]    ; 0x30
> <atomic_update_original+48>
>     0x0000000c<+12>:    mov    r2, #0
>     0x00000010<+16>:    ldrexd    r0, [r12]
>     0x00000014<+20>:    adds    r0, r0, r2
>     0x00000018<+24>:    adc    r1, r1, r3
>     0x0000001c<+28>:    strexd    r4, r0, [r12]
>     0x00000020<+32>:    teq    r4, #0
>     0x00000024<+36>:    bne    0x10<atomic_update_original+16>
>     0x00000028<+40>:    ldmfd    sp!, {r4}
>     0x0000002c<+44>:    bx    lr
>     0x00000030<+48>:    andeq    r0, r0, r0
> End of assembler dump.
>
> Disassemble of fixed code
> -------------------------
>
> (gdb) disassemble atomic_update_fixed
> Dump of assembler code for function atomic_update_fixed:
>     0x00000034<+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
>     0x00000038<+4>:    mov    r3, #1073741824    ; 0x40000000
>     0x0000003c<+8>:    ldr    r12, [pc, #32]    ; 0x64<atomic_update_fixed+48>
>     0x00000040<+12>:    mov    r2, #0
>     0x00000044<+16>:    ldrexd    r0, [r12]
>     0x00000048<+20>:    adds    r1, r1, r3
>     0x0000004c<+24>:    adc    r0, r0, r2
>     0x00000050<+28>:    strexd    r4, r0, [r12]
>     0x00000054<+32>:    teq    r4, #0
>     0x00000058<+36>:    bne    0x44<atomic_update_fixed+16>
>     0x0000005c<+40>:    ldmfd    sp!, {r4}
>     0x00000060<+44>:    bx    lr
>     0x00000064<+48>:    andeq    r0, r0, r0
> End of assembler dump.
>
> Fixed code explanation
> ----------------------
>
> - $r2 has 4 most significant bytes of increement (0)
> - $r3 has 4 least significant bytes of increement (0x40000000)
>
> - Load from address at $r12 'long long' into r0 and r1
>    $r0 will get 4 most significant bytes (i.e 4 bytes at $r12)
>    $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4)
>    loads are big endian, so $r0 and $r0 will be read with proper be swap
>
>     0x00000044<+16>:    ldrexd    r0, [r12]
>
> - add $r3 to $r1 store result into $r1, carry flag could be set
> - adding 4 least sginificant bytes of 'long long'
>
>     0x00000048<+20>:    adds    r1, r1, r3
>
> - adding with carry most siginificant parts of increment and counter
>
>     0x0000004c<+24>:    adc    r0, r0, r2
>
> - Store result back
>    $r0 will to $r12 address
>    $r1 will to $r12+4 address
>
>     0x00000050<+28>:    strexd    r4, r0, [r12]
>
> Ldrd/strd
> ---------
>
> Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons
> in big endian mode will do byteswap only within 4 bytes boundary, but it will
> not swap registers numbers itself. I.e ldrexd rN,<addr>  puts swapped 4 bytes
> at<addr>  address into rN, and it puts swapped 4 bytes at<addr+4>  into rN+1

Looking at the ARM ARM, the following is LDRD.

if ConditionPassed() then
     EncodingSpecificOperations(); NullCheckIfThumbEE(15);
     address = if add then (Align(PC,4) + imm32) else (Align(PC,4) - imm32);
     if HaveLPAE() && address<2:0> == '000' then
         data = MemA[Address,8];
         if BigEndian() then
              R[t] = data<63:32>;
              R[t2] = data<31:0>;
         else
              R[t] = data<31:0>;
              R[t2] = data<63:32>;
     else
         R[t] = MemA[address,4];
         R[t2] = MemA[address+4,4];

I thought it always had R[t] as lowest and R[t2] as the highest.

> Compiler?
> ---------
>
> One may thing whether we have compiler bug here. And after all, what is the
> meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what
> I got. Please look at
>
> https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c
>
> and search for>'Q', 'R' and 'H'<
>
> according to this page '%HN' is always reg+1 operand regardless whether it is
> big endian system or not. It is opposite to 'Q'.
>
> Better fix
> ----------
>
> In fact now I believe correct fix should use 'Q' for least siginificant 4
> bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store
> instructions.
>
> It should look something like this:
>
> static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> "    adds    %Q0, %Q0, %Q4\n"
> "    adc    %R0, %R0, %R4\n"
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
> It is better that my original fix because it does not have conditional
> compilation. I've tested it in the module, it works on both LE and BE kernel
> variants.

If you want to make a patch with this in, I can add it to my series
ready to get merged.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-24  1:06         ` Victor Kamensky
@ 2013-07-24  9:36           ` Will Deacon
  2013-07-24 10:46           ` Ben Dooks
  1 sibling, 0 replies; 29+ messages in thread
From: Will Deacon @ 2013-07-24  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 24, 2013 at 02:06:05AM +0100, Victor Kamensky wrote:
> Hi Ben,
> 
> Let me split off atomic64 from other issues, so we have focused topic here.
> 
> Please see atomic64 issue test case below. Please try it in your setup. I've
> run on Pandaboard ES with my set of patches, but I believe it should be the
> same in your case.

[...]


> <snip>
> 
> >> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
> >> ===================================================================
> >> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
> >> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
> >> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
> >>
> >>       __asm__ __volatile__("@ atomic64_add\n"
> >>   "1:    ldrexd    %0, %H0, [%3]\n"
> >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
> >>   "    adds    %0, %0, %4\n"
> >>   "    adc    %H0, %H0, %H4\n"
> >> +#else
> >> +"    adds    %H0, %H0, %H4\n"
> >> +"    adc    %0, %0, %4\n"
> >> +#endif

I thought you could just use the %Q and %R output modifiers to get the
appropriate halves of the 64-bit operand, then you can avoid the #ifdef.

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 18:30       ` Ben Dooks
  2013-07-24  1:06         ` Victor Kamensky
@ 2013-07-24  7:31         ` Victor Kamensky
  2013-07-24 21:24           ` Victor Kamensky
  1 sibling, 1 reply; 29+ messages in thread
From: Victor Kamensky @ 2013-07-24  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Please see inline

On 23 July 2013 11:30, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> On 23/07/13 19:03, Victor Kamensky wrote:
>>
>> Hi Ben,
>>
>> Sorry for multiple post. I am new to this, and I could not get mailing
>> list to accept my original email.
>> For sake of others so people can see discussed patch, here it goes
>> again with patch file inlined
>> now. Hope it will this time
>>
>> Wrt BE8 little endian instructions you will need to fix couple more
>> places:
>>      ftrace arch/arm/kernel/ftrace.c
>>      kprobes arch/arm/kernel/kprobes.c
>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>> perf
>> counters will be truncated on max of 32bit values
>>      atomic64 arch/arm/include/asm/atomic.h
>>
>> I've attached board independent (core) patch from my work that made
>> Pandaboard ES
>> kernel run in BE mode. You can check my version of fixes for above issues
>> in the
>> patch. Look for corresponding files changes. Please rework them
>> according to style
>> and rules of your patch series. Note the patch itself contains fixes
>> for few other
>> issues that already fixed in your series. I'll cross check and compare
>> individual
>> areas latter. I think you may find attached patch interesting as well,
>> it was developed
>> independent from yours but matches its very well.
>>
>> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were
>> tested
>> with SystemTap at some point of patch version on Pandaboard ES (not
>> thumb mode though),
>> also it may deteriorated while ported across different kernel version,
>> good idea to
>> rested last patch. atomic64 were tested on Pandaboard ES and Marvell
>> Armada XP.
>>
>> Thanks,
>> Victor
>
>
> First notes, please always format your emails (where possible) to
> under 80 characters per line so that they do not get mangled when
> viewed on limited screens (for example, I often read email via a
> 24x80 terminal)

Yes, I understand. Sorry about this. Please bear with me a bit. I am new to
Linaro .. just my second week started. Still learning tools used here, and
still struggling with default email client.

> Second note, when producing patches, it is better to try and split
> them into a series of patches to reach a goal than just one large
> patch. It keeps a lot of useful information (see patch comments)
> and it means that if there is a problem, then bits can be lifted
> out to check.

Yes, I understand. Indeed multiple patches is best from. But I did not really
mean try to contribute these patches, but rather throw patch over the wall for
reference.

I believe largely your series covers the same things, in more accurate way,
and should be continued. I plan to review your series and will try to
integrate it and run on Pandaboard ES. Note for this testing I would use big
patch of TI OMAP4 drivers to be endian neutral. It will take me several days.
If I find anything I will let you know accross these days.

Note my original experiment was done against TI 3.1 kernel and after that
was carried forward without great care, so it may explain few descrepencies
below and my blunder with ftrace comment.

>
>
>> Index: linux-linaro-tracking/arch/arm/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/Makefile
>> +++ linux-linaro-tracking/arch/arm/Makefile
>> @@ -16,6 +16,7 @@ LDFLAGS        :=
>>   LDFLAGS_vmlinux    :=-p --no-undefined -X
>>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>>   LDFLAGS_vmlinux    += --be8
>> +KBUILD_LDFLAGS_MODULE += --be8
>>   endif
>>
>>   OBJCOPYFLAGS    :=-O binary -R .comment -S
>> @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
>>   KBUILD_CFLAGS    +=-fstack-protector
>>   endif
>>
>> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
>> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
>> +# be in CPPFLAGS because it affects what macros (__ARMEB__,
>> __BYTE_ORDER__,
>> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
>> +# good compromise
>>   ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>>   KBUILD_CPPFLAGS    += -mbig-endian
>> +KBUILD_CFLAGS    += -mbig-endian
>>   AS        += -EB
>>   LD        += -EB
>>   else
>>   KBUILD_CPPFLAGS    += -mlittle-endian
>> +KBUILD_CFLAGS    += -mlittle-endian
>>   AS        += -EL
>>   LD        += -EL
>>   endif
>
>
> See the comment below.
>
>
>> Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
>> +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
>> @@ -138,6 +138,24 @@ endif
>>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>>   LDFLAGS_vmlinux += --be8
>>   endif
>> +
>> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
>> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
>> +# be in CPPFLAGS because it affects what macros (__ARMEB__,
>> __BYTE_ORDER__,
>> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
>> +# good compromise
>> +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>> +KBUILD_CPPFLAGS    += -mbig-endian
>> +KBUILD_CFLAGS    += -mbig-endian
>> +AS        += -EB
>> +LD        += -EB
>> +else
>> +KBUILD_CPPFLAGS    += -mlittle-endian
>> +KBUILD_CFLAGS    += -mlittle-endian
>> +AS        += -EL
>> +LD        += -EL
>> +endif
>> +
>>   # ?
>>   LDFLAGS_vmlinux += -p
>>   # Report unresolved symbol references
>
>
> I've not been bitten by this one. Is there anyone else who can comment
> on KBUILD_CFLAGS vs KBUILD_CPPFLAGS?
>
> I had assumed that arch/arm/boot/compressed/Makefile would have
> inherited the flags from the arch/arm/Makefile. Someone else would
> be better qualified to comment on this.

Let's ignore this. I don't think it is relevant now. There was probably some
odd situation in old kernel.

>
>> Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
>> +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
>> @@ -141,6 +141,12 @@ start:
>>   #endif
>>           mov    r7, r1            @ save architecture ID
>>           mov    r8, r2            @ save atags pointer
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +                /*
>> +                 * Switch to bigendian mode
>> +                 */
>> +                setend  be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>
>>   #ifndef __ARM_ARCH_2__
>>           /*
>> Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/setup.h
>> @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
>>   extern void early_print(const char *str, ...);
>>   extern void dump_machine_table(void);
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +void byteswap_tags(struct tag *t);
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>> +
>>   #endif
>
>
> You could have done
>
> #ifdef CONFIG_LITTLE_ENDIAN_LOADER
> void byteswap_tags(struct tag *t);
> #else
> static inline void byteswap_tags(struct tag *t) { }
> #endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
>> +++ linux-linaro-tracking/arch/arm/kernel/Makefile
>> @@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
>>   obj-$(CONFIG_ATAGS)        += atags_parse.o
>>   obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
>>   obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
>> +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o
>>
>>   obj-$(CONFIG_OC_ETM)        += etm.o
>>   obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
>> Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
>> ===================================================================
>> --- /dev/null
>> +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
>> @@ -0,0 +1,119 @@
>> +#include<linux/slab.h>
>> +#include<linux/proc_fs.h>
>> +#include<linux/swab.h>
>> +#include<asm/setup.h>
>> +#include<asm/types.h>
>> +#include<asm/page.h>
>> +
>> +#define byteswap32(x) (x) = __swab32((x))
>> +#define byteswap16(x) (x) = __swab16((x))
>> +
>> +static void __init byteswap_tag_core(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.core.flags);
>> +  byteswap32(tag->u.core.pagesize);
>> +  byteswap32(tag->u.core.rootdev);
>> +}
>> +
>> +static void __init byteswap_tag_mem32(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.mem.size);
>> +  byteswap32(tag->u.mem.start);
>> +}
>> +
>> +static void __init byteswap_tag_videotext(struct tag *tag)
>> +{
>> +  byteswap16(tag->u.videotext.video_page);
>> +  byteswap16(tag->u.videotext.video_ega_bx);
>> +  byteswap16(tag->u.videotext.video_points);
>> +}
>> +
>> +static void __init byteswap_tag_ramdisk(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.ramdisk.flags);
>> +  byteswap32(tag->u.ramdisk.size);
>> +  byteswap32(tag->u.ramdisk.start);
>> +}
>> +
>> +static void __init byteswap_tag_initrd(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.initrd.start);
>> +  byteswap32(tag->u.initrd.size);
>> +}
>> +
>> +static void __init byteswap_tag_serialnr(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.serialnr.low);
>> +  byteswap32(tag->u.serialnr.high);
>> +}
>> +
>> +static void __init byteswap_tag_revision(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.revision.rev);
>> +}
>> +
>> +static void __init byteswap_tag_videolfb(struct tag *tag)
>> +{
>> +    byteswap16(tag->u.videolfb.lfb_width);
>> +    byteswap16(tag->u.videolfb.lfb_height);
>> +    byteswap16(tag->u.videolfb.lfb_depth);
>> +    byteswap16(tag->u.videolfb.lfb_linelength);
>> +    byteswap32(tag->u.videolfb.lfb_base);
>> +    byteswap32(tag->u.videolfb.lfb_size);
>> +}
>> +
>> +static void __init byteswap_tag_acorn(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.acorn.memc_control_reg);
>> +    byteswap32(tag->u.acorn.vram_pages);
>> +}
>> +
>> +static void __init byteswap_tag_memclk(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.memclk.fmemclk);
>> +}
>> +
>> +void __init byteswap_tags(struct tag *t)
>> +{
>> +  for (; t->hdr.size; t = tag_next(t)) {
>> +    byteswap32(t->hdr.size);
>> +    byteswap32(t->hdr.tag);
>> +    switch(t->hdr.tag) {
>> +    case ATAG_CORE:
>> +      byteswap_tag_core(t);
>> +      break;
>> +    case ATAG_MEM:
>> +      byteswap_tag_mem32(t);
>> +      break;
>> +    case ATAG_VIDEOTEXT:
>> +      byteswap_tag_videotext(t);
>> +      break;
>> +    case ATAG_RAMDISK:
>> +      byteswap_tag_ramdisk(t);
>> +      break;
>> +    case ATAG_INITRD2:
>> +      byteswap_tag_initrd(t);
>> +      break;
>> +    case ATAG_SERIAL:
>> +      byteswap_tag_serialnr(t);
>> +      break;
>> +    case ATAG_REVISION:
>> +      byteswap_tag_revision(t);
>> +      break;
>> +    case ATAG_VIDEOLFB:
>> +      byteswap_tag_videolfb(t);
>> +      break;
>> +    case ATAG_CMDLINE:
>> +      break;
>> +    case ATAG_ACORN:
>> +      byteswap_tag_acorn(t);
>> +      break;
>> +    case ATAG_MEMCLK:
>> +      byteswap_tag_memclk(t);
>> +      break;
>> +    default:
>> +      /* Need to complain? */
>> +      break;
>> +    }
>> +  }
>> +}
>
>
> I think swapping on read is a better idea, as it leaves
> these in their original format.
>
> PS, looks like you either have a formatting issue with
> your email client or your code. Did this pass checkpatch?
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
>> +++ linux-linaro-tracking/arch/arm/kernel/head-common.S
>> @@ -48,6 +48,9 @@ __vet_atags:
>>       bne    1f
>>
>>       ldr    r5, [r2, #0]
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +        rev     r5, r5
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>   #ifdef CONFIG_OF_FLATTREE
>>       ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
>>       cmp    r5, r6
>> @@ -57,6 +60,9 @@ __vet_atags:
>>       cmpne    r5, #ATAG_CORE_SIZE_EMPTY
>>       bne    1f
>>       ldr    r5, [r2, #4]
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +        rev     r5, r5
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       ldr    r6, =ATAG_CORE
>>       cmp    r5, r6
>>       bne    1f
>
>
> bit-untidy. also, does OF_DT_MAGIC get changed by the
> endian-ness, or is that somewhere else in the head code?
>
>> Index: linux-linaro-tracking/arch/arm/kernel/module.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/module.c
>> +++ linux-linaro-tracking/arch/arm/kernel/module.c
>> @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
>>   }
>>   #endif
>>
>> +/*
>> + * For relocations in exectuable sections if we configured with
>> + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
>> + * form already and we need to byteswap value when we read and when we
>> + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation
>> for
>> + * data piece and we do not perform byteswaps when such relocation
>> applied.
>> + * This herustic based approach may break in future.
>> + *
>> + * It seems that more correct way to handle be8 in kernel module is
>> + * to build module without --be8 option, but perform instruction byteswap
>> + * when module sections are loaded, after all relocation are applied.
>> + * In order to do such byteswap we need to read all mappings symbols ($a,
>> + * $d, $t), sort them, and apply byteswaps for $a and $t entries.
>> + */
>> +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
>> +{
>> +       u32 retval = *(u32 *)loc;
>> +
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                retval = __swab32(retval);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        return retval;
>> +}
>> +
>> +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
>> long loc, u32 value)
>> +{
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                value = __swab32(value);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        *(u32 *) loc = value;
>> +}
>> +
>> +#ifdef CONFIG_THUMB2_KERNEL
>> +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
>> +{
>> +        u16 retval = *(u16 *) loc;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                retval = __swab16(retval);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        return retval;
>> +}
>> +
>> +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
>> long loc, u16 value)
>> +{
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                value = __swab16(value);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        *(u16 *) loc = value;
>> +}
>> +#endif /* CONFIG_THUMB2_KERNEL */
>> +
>>   int
>>   apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int
>> symindex,
>>              unsigned int relindex, struct module *module)
>> @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>           Elf32_Sym *sym;
>>           const char *symname;
>>           s32 offset;
>> +                u32 loc_u32;
>>   #ifdef CONFIG_THUMB2_KERNEL
>>           u32 upper, lower, sign, j1, j2;
>>   #endif
>> @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>           case R_ARM_PC24:
>>           case R_ARM_CALL:
>>           case R_ARM_JUMP24:
>> -            offset = (*(u32 *)loc&  0x00ffffff)<<  2;
>>
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = (loc_u32&  0x00ffffff)<<  2;
>>               if (offset&  0x02000000)
>>
>>                   offset -= 0x04000000;
>>
>> @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>
>>               offset>>= 2;
>>
>> -            *(u32 *)loc&= 0xff000000;
>>
>> -            *(u32 *)loc |= offset&  0x00ffffff;
>> +            loc_u32&= 0xff000000;
>> +            loc_u32 |= offset&  0x00ffffff;
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>> -           case R_ARM_V4BX:
>> +                case R_ARM_V4BX:
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>>                  /* Preserve Rm and the condition code. Alter
>>               * other bits to re-code instruction as
>>               * MOV PC,Rm.
>>               */
>> -               *(u32 *)loc&= 0xf000000f;
>>
>> -               *(u32 *)loc |= 0x01a0f000;
>> +               loc_u32&= 0xf000000f;
>>
>> +               loc_u32 |= 0x01a0f000;
>> +                       write_section_u32(dstsec, loc, loc_u32);
>>                  break;
>>
>>           case R_ARM_PREL31:
>> -            offset = *(u32 *)loc + sym->st_value - loc;
>> -            *(u32 *)loc = offset&  0x7fffffff;
>>
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = loc_u32 + sym->st_value - loc;
>> +            loc_u32 = offset&  0x7fffffff;
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>>           case R_ARM_MOVW_ABS_NC:
>>           case R_ARM_MOVT_ABS:
>> -            offset = *(u32 *)loc;
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = loc_u32;
>>               offset = ((offset&  0xf0000)>>  4) | (offset&  0xfff);
>>
>>               offset = (offset ^ 0x8000) - 0x8000;
>>
>> @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
>>                   offset>>= 16;
>>
>> -            *(u32 *)loc&= 0xfff0f000;
>>
>> -            *(u32 *)loc |= ((offset&  0xf000)<<  4) |
>> -                    (offset&  0x0fff);
>> +            loc_u32&= 0xfff0f000;
>> +            loc_u32 |= ((offset&  0xf000)<<  4) |
>> +                                   (offset&  0x0fff);
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>>   #ifdef CONFIG_THUMB2_KERNEL
>>           case R_ARM_THM_CALL:
>>           case R_ARM_THM_JUMP24:
>> -            upper = *(u16 *)loc;
>> -            lower = *(u16 *)(loc + 2);
>> +                        upper = read_section_u16(dstsec, loc);
>> +            lower = read_section_u16(dstsec, loc + 2);
>>
>>               /*
>>                * 25 bit signed address range (Thumb-2 BL and B.W
>> @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               sign = (offset>>  24)&  1;
>>               j1 = sign ^ (~(offset>>  23)&  1);
>>               j2 = sign ^ (~(offset>>  22)&  1);
>> -            *(u16 *)loc = (u16)((upper&  0xf800) | (sign<<  10) |
>> -                        ((offset>>  12)&  0x03ff));
>> -            *(u16 *)(loc + 2) = (u16)((lower&  0xd000) |
>>
>> -                          (j1<<  13) | (j2<<  11) |
>> -                          ((offset>>  1)&  0x07ff));
>> +                        write_section_u16(dstsec, loc,
>> +                                          (u16)((upper&  0xf800) |
>> (sign<<  10) |
>> +                                                ((offset>>  12)&
>> 0x03ff)));
>>
>> +                        write_section_u16(dstsec, loc + 2,
>> +                                          (u16)((lower&  0xd000) |
>>
>> +                                                (j1<<  13) | (j2<<  11) |
>> +                                                ((offset>>  1)&
>> 0x07ff)));
>>
>>               break;
>>
>>           case R_ARM_THM_MOVW_ABS_NC:
>>           case R_ARM_THM_MOVT_ABS:
>> -            upper = *(u16 *)loc;
>> -            lower = *(u16 *)(loc + 2);
>> +                        upper = read_section_u16(dstsec, loc);
>> +            lower = read_section_u16(dstsec, loc + 2);
>>
>>               /*
>>                * MOVT/MOVW instructions encoding in Thumb-2:
>> @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
>>                   offset>>= 16;
>>
>> -            *(u16 *)loc = (u16)((upper&  0xfbf0) |
>> -                        ((offset&  0xf000)>>  12) |
>> -                        ((offset&  0x0800)>>  1));
>>
>> -            *(u16 *)(loc + 2) = (u16)((lower&  0x8f00) |
>> -                          ((offset&  0x0700)<<  4) |
>> -                          (offset&  0x00ff));
>>
>> +                        write_section_u16(dstsec, loc,
>> +                                          (u16)((upper&  0xfbf0) |
>> +                                                ((offset&  0xf000)>>  12)
>> |
>> +                                                ((offset&  0x0800)>>
>> 1)));
>>
>> +                        write_section_u16(dstsec, loc + 2,
>> +                                          (u16)((lower&  0x8f00) |
>> +                                                ((offset&  0x0700)<<  4)
>> |
>> +                                                (offset&  0x00ff)));
>>               break;
>>   #endif
>>
>
> I'm still not sure about this, I think the use of <asm/opcodes.h>
> is better here. Not sure if there's much point in checking the flags
> on the sections as I think the relocations pretty much define which
> endian they will be in.

Wrt asm/opcodes.h, yes, agree it should be used now.

I'll need to check why I used section attributes to guide byteswaps. I think it
was done for a reason. We tested it with bigger system on top with quite a bit
of different KMLs, I think I run into problem with some of them. Will need dig
back. As my comment says ideally it would be better to use $a, $d, $t, symbols;
that is what ld does with --be8 option, but it maybe too hard as
starting point.

>> Index: linux-linaro-tracking/arch/arm/mm/Kconfig
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
>> +++ linux-linaro-tracking/arch/arm/mm/Kconfig
>> @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
>>       help
>>         Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.
>>
>> +config LITTLE_ENDIAN_LOADER
>> +        bool
>> +        depends on CPU_BIG_ENDIAN
>> +        default y
>> +        help
>> +          If Linux should operate in big-endian mode, but bootloader
>> (i.e u-boot)
>> +          is still in little endian mode and passes boot information in
>> little
>> +          endian form set this flag
>> +
>>   config CPU_HIGH_VECTOR
>>       depends on !MMU&&  CPU_CP15&&  !CPU_ARM740T
>>
>>       bool "Select the High exception vector"
>
>
>
>> Index: linux-linaro-tracking/arch/arm/mm/alignment.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
>> +++ linux-linaro-tracking/arch/arm/mm/alignment.c
>> @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne
>>
>>       regs->ARM_pc += isize;
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        instr = __swab32(instr); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       switch (CODING_BITS(instr)) {
>>       case 0x00000000:    /* 3.13.4 load/store instruction extensions */
>>           if (LDSTHD_I_BIT(instr))
>
>
> See comments on <asm/opcodes.h>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/head.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/head.S
>> +++ linux-linaro-tracking/arch/arm/kernel/head.S
>> @@ -89,6 +89,12 @@ ENTRY(stext)
>>       @ ensure svc mode and all interrupts masked
>>       safe_svcmode_maskall r9
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    /*
>> +    * Switch to bigendian mode
>> +    */
>> +    setend    be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       mrc    p15, 0, r9, c0, c0        @ get processor id
>>       bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
>>       movs    r10, r5                @ invalid processor (r5=0)?
>> @@ -356,6 +362,12 @@ ENTRY(secondary_startup)
>>   #endif
>>       safe_svcmode_maskall r9
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    /*
>> +    * Switch to bigendian mode
>> +    */
>> +    setend    be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       mrc    p15, 0, r9, c0, c0        @ get processor id
>>       bl    __lookup_processor_type
>>       movs    r10, r5                @ invalid processor?
>> @@ -427,6 +439,9 @@ __enable_mmu:
>>   #ifdef CONFIG_CPU_ICACHE_DISABLE
>>       bic    r0, r0, #CR_I
>>   #endif
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +    orr    r0, r0, #CR_EE    @ big-endian page tables
>> +#endif
>>   #ifdef CONFIG_ARM_LPAE
>>       mov    r5, #0
>>       mcrr    p15, 0, r4, r5, c2        @ load TTBR0
>> @@ -592,8 +607,14 @@ __fixup_a_pv_table:
>>       b    2f
>>   1:    add     r7, r3
>>       ldrh    ip, [r7, #2]
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       and    ip, 0x8f00
>>       orr    ip, r6    @ mask in offset bits 31-24
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       strh    ip, [r7, #2]
>>   2:    cmp    r4, r5
>>       ldrcc    r7, [r4], #4    @ use branch for delay slot
>> @@ -602,8 +623,14 @@ __fixup_a_pv_table:
>>   #else
>>       b    2f
>>   1:    ldr    ip, [r7, r3]
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       bic    ip, ip, #0x000000ff
>>       orr    ip, ip, r6    @ mask in offset bits 31-24
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       str    ip, [r7, r3]
>>   2:    cmp    r4, r5
>>       ldrcc    r7, [r4], #4    @ use branch for delay slot
>
>
> Yeah, prefer my macros for doing this.
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
>> +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
>> @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
>>       if (is_wide_instruction(insn)) {
>>           insn<<= 16;
>>           insn |= ((u16 *)addr)[1];
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                insn = __swab32(insn); /* little endian instruction */
>> +#endif
>>           decode_insn = thumb32_kprobe_decode_insn;
>> -    } else
>> -        decode_insn = thumb16_kprobe_decode_insn;
>> +    } else {
>> +                decode_insn = thumb16_kprobe_decode_insn;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                insn = __swab16(insn); /* little endian instruction */
>> +#endif
>> +        }
>>   #else /* !CONFIG_THUMB2_KERNEL */
>>       thumb = false;
>>       if (addr&  0x3)
>>
>>           return -EINVAL;
>>       insn = *p->addr;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        insn = __swab32(insn); /* little endian instruction */
>> +#endif
>>       decode_insn = arm_kprobe_decode_insn;
>>   #endif
>>
>> @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
>>           if (!p->ainsn.insn)
>>               return -ENOMEM;
>>           for (is = 0; is<  MAX_INSN_SIZE; ++is)
>> +#ifndef CONFIG_CPU_ENDIAN_BE8
>>               p->ainsn.insn[is] = tmp_insn[is];
>> +#else
>> +                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
>> /* little endian instruction */
>> +#endif
>>           flush_insns(p->ainsn.insn,
>>                   sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
>>           p->ainsn.insn_fn = (kprobe_insn_fn_t *)
>> @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
>>           /* Remove any Thumb flag */
>>           addr = (void *)((uintptr_t)p->addr&  ~1);
>>
>>
>> -        if (is_wide_instruction(p->opcode))
>> +        if (is_wide_instruction(p->opcode)) {
>>               brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
>> -        else
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                brkp = __swab32(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        } else {
>>               brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                brkp = __swab16(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +            }
>>       } else {
>>           kprobe_opcode_t insn = p->opcode;
>>
>> @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
>>               brkp |= 0xe0000000;  /* Unconditional instruction */
>>           else
>>               brkp |= insn&  0xf0000000;  /* Copy condition from insn */
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        brkp = __swab32(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       }
>>
>>       patch_text(addr, brkp);
>> @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
>>   {
>>       struct kprobe *kp = p;
>>       void *addr = (void *)((uintptr_t)kp->addr&  ~1);
>>
>> +    kprobe_opcode_t insn = kp->opcode;
>>
>> -    __patch_text(addr, kp->opcode);
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        insn = __swab32(insn); /* little endian instruction */
>> +#endif
>> +    __patch_text(addr, insn);
>>
>>       return 0;
>>
>
> __patch_text already does swapping, so a little concerned that it
> is being done twice in some places.
>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/traps.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
>> +++ linux-linaro-tracking/arch/arm/kernel/traps.c
>> @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
>>           goto die_sig;
>>       }
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        instr = __swab32(instr); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       if (call_undef_hook(regs, instr) == 0)
>>           return;
>
>
> already fixed.
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
>> +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
>> @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
>>           if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
>>               return -EFAULT;
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        replaced = __swab32(replaced); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>           if (replaced != old)
>>               return -EINVAL;
>>       }
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        new = __swab32(new); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE))
>>           return -EPERM;
>
>
> Please see <asm/opcodes.h> for dealing with the op-codes like this.

Yes, agree. It is not needed any more. In kernel 3.1/3.2 there was no
asm/opcodes.h, __opcode_to_mem_arm, etc macros. So that was added. It is
incorrect now.

>
>> Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
>> +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
>> @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
>>       if (tags->hdr.tag != ATAG_CORE)
>>           convert_to_tag_list(tags);
>>   #endif
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    byteswap_tags(tags);
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>> +
>>       if (tags->hdr.tag != ATAG_CORE) {
>>           early_print("Warning: Neither atags nor dtb found\n");
>>           tags = (struct tag *)&default_tags;
>> Index: linux-linaro-tracking/arch/arm/kernel/sleep.S
>
>
> I really don't like this way, it makes it difficult to deal with
> if you export this data to user-space to then do things like md5sum
> it. Also, if you're going to kexec() a new kernel, would you have
> to re-swap it back before this?

Agree that swap in place could be problematic.
I will look at this, starting with patches you've done. Will get back to you.

>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_add\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_add_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_sub\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_sub_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>
>>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, #1\n"
>>   "    sbc    %H0, %H0, #0\n"
>>   "    teq    %H0, #0\n"
>> +#else
>> +"    subs    %H0, %H0, #1\n"
>> +"    sbc    %0, %0, #0\n"
>> +"    teq    %0, #0\n"
>> +#endif
>>   "    bmi    2f\n"
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>   "    teqeq    %H0, %H5\n"
>>   "    moveq    %1, #0\n"
>>   "    beq    2f\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %6\n"
>>   "    adc    %H0, %H0, %H6\n"
>> +#else
>> +"    adds    %H0, %H0, %H6\n"
>> +"    adc    %0, %0, %6\n"
>> +#endif
>>   "    strexd    %2, %0, %H0, [%4]\n"
>>   "    teq    %2, #0\n"
>>   "    bne    1b\n"
>
>
> I still believe that you're doing the wrong thing here
> and that these can be left well enough alone. Please give
> a concrete piece of code that can show they're actually
> doing the wrong thing.

That one is addressed in separate email.

Thanks,
Victor

>
>
>
> --
> Ben Dooks                               http://www.codethink.co.uk/
> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 18:30       ` Ben Dooks
@ 2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  9:36           ` Will Deacon
  2013-07-24 10:46           ` Ben Dooks
  2013-07-24  7:31         ` Victor Kamensky
  1 sibling, 2 replies; 29+ messages in thread
From: Victor Kamensky @ 2013-07-24  1:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Let me split off atomic64 from other issues, so we have focused topic here.

Please see atomic64 issue test case below. Please try it in your setup. I've
run on Pandaboard ES with my set of patches, but I believe it should be the
same in your case.

<snip>

>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_add\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_add_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_sub\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_sub_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>
>>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, #1\n"
>>   "    sbc    %H0, %H0, #0\n"
>>   "    teq    %H0, #0\n"
>> +#else
>> +"    subs    %H0, %H0, #1\n"
>> +"    sbc    %0, %0, #0\n"
>> +"    teq    %0, #0\n"
>> +#endif
>>   "    bmi    2f\n"
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>   "    teqeq    %H0, %H5\n"
>>   "    moveq    %1, #0\n"
>>   "    beq    2f\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %6\n"
>>   "    adc    %H0, %H0, %H6\n"
>> +#else
>> +"    adds    %H0, %H0, %H6\n"
>> +"    adc    %0, %0, %6\n"
>> +#endif
>>   "    strexd    %2, %0, %H0, [%4]\n"
>>   "    teq    %2, #0\n"
>>   "    bne    1b\n"
>
>
> I still believe that you're doing the wrong thing here
> and that these can be left well enough alone. Please give
> a concrete piece of code that can show they're actually
> doing the wrong thing.

Disable CONFIG_GENERIC_ATOMIC64
-------------------------------

Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from
lib/atomic64.c will be used and that one works correctly but this case does
not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h

Personally with my current kernel close to 3.10 I failed to do that and manage
to get live kernel, so to illustrate the issue I will use my own copy of
inline atomic64_add function. But effectively it will show the same problem
that I tried to report.

Compiler used
-------------

gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux

Test module source
------------------

#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/atomic.h>

static inline void my_atomic64_add_original(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
"    adds    %0, %0, %4\n"
"    adc    %H0, %H0, %H4\n"
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}

static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
"    adds    %0, %0, %4\n"
"    adc    %H0, %H0, %H4\n"
#else
"    adds    %H0, %H0, %H4\n"
"    adc    %0, %0, %4\n"
#endif
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}


atomic64_t a = {0};

#define INCREMENT 0x40000000

void atomic_update_original (void)
{
    my_atomic64_add_original(INCREMENT, &a);
}

void atomic_update_fixed (void)
{
    my_atomic64_add_fixed(INCREMENT, &a);
}

void test_atomic64_original(void)
{
    int i;

    printk("original atomic64_add\n");

    atomic64_set(&a, 0);

    printk("a = %llx\n", a.counter);

    for (i = 0; i < 10; i++) {
        atomic_update_original();
        printk("i = %2d: a =  %llx\n", i, a.counter);
    }
}

void test_atomic64_fixed(void)
{
    int i;

    printk("fixed atomic64_add\n");

    atomic64_set(&a, 0);

    printk("a = %llx\n", a.counter);

    for (i = 0; i < 10; i++) {
        atomic_update_fixed();
        printk("i = %2d: a =  %llx\n", i, a.counter);
    }
}

int
init_module (void)
{
    test_atomic64_original();
    test_atomic64_fixed();
    return 0;
}

void
cleanup_module(void)
{
}

MODULE_LICENSE("GPL");


Test run
--------

Compile and run module, observe in original code counter is not incremented
correctly.

root@genericarmv7ab:~# insmod atomic64.ko
[   73.041107] original atomic64_add
[   73.044647] a = 0
[   73.046691] i =  0: a =  40000000
[   73.050659] i =  1: a =  80000000
[   73.054199] i =  2: a =  c0000000
[   73.057983] i =  3: a =  0
[   73.060852] i =  4: a =  40000000
[   73.064361] i =  5: a =  80000000
[   73.067932] i =  6: a =  c0000000
[   73.071441] i =  7: a =  0
[   73.074310] i =  8: a =  40000000
[   73.077850] i =  9: a =  80000000
[   73.081390] fixed atomic64_add
[   73.084625] a = 0
[   73.086669] i =  0: a =  40000000
[   73.090209] i =  1: a =  80000000
[   73.093749] i =  2: a =  c0000000
[   73.097290] i =  3: a =  100000000
[   73.100891] i =  4: a =  140000000
[   73.104522] i =  5: a =  180000000
[   73.108154] i =  6: a =  1c0000000
[   73.111755] i =  7: a =  200000000
[   73.115386] i =  8: a =  240000000
[   73.119018] i =  9: a =  280000000


Disassemble of original code
----------------------------

(gdb) disassemble atomic_update_original
Dump of assembler code for function atomic_update_original:
   0x00000000 <+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
   0x00000004 <+4>:    mov    r3, #1073741824    ; 0x40000000
   0x00000008 <+8>:    ldr    r12, [pc, #32]    ; 0x30
<atomic_update_original+48>
   0x0000000c <+12>:    mov    r2, #0
   0x00000010 <+16>:    ldrexd    r0, [r12]
   0x00000014 <+20>:    adds    r0, r0, r2
   0x00000018 <+24>:    adc    r1, r1, r3
   0x0000001c <+28>:    strexd    r4, r0, [r12]
   0x00000020 <+32>:    teq    r4, #0
   0x00000024 <+36>:    bne    0x10 <atomic_update_original+16>
   0x00000028 <+40>:    ldmfd    sp!, {r4}
   0x0000002c <+44>:    bx    lr
   0x00000030 <+48>:    andeq    r0, r0, r0
End of assembler dump.

Disassemble of fixed code
-------------------------

(gdb) disassemble atomic_update_fixed
Dump of assembler code for function atomic_update_fixed:
   0x00000034 <+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
   0x00000038 <+4>:    mov    r3, #1073741824    ; 0x40000000
   0x0000003c <+8>:    ldr    r12, [pc, #32]    ; 0x64 <atomic_update_fixed+48>
   0x00000040 <+12>:    mov    r2, #0
   0x00000044 <+16>:    ldrexd    r0, [r12]
   0x00000048 <+20>:    adds    r1, r1, r3
   0x0000004c <+24>:    adc    r0, r0, r2
   0x00000050 <+28>:    strexd    r4, r0, [r12]
   0x00000054 <+32>:    teq    r4, #0
   0x00000058 <+36>:    bne    0x44 <atomic_update_fixed+16>
   0x0000005c <+40>:    ldmfd    sp!, {r4}
   0x00000060 <+44>:    bx    lr
   0x00000064 <+48>:    andeq    r0, r0, r0
End of assembler dump.

Fixed code explanation
----------------------

- $r2 has 4 most significant bytes of increement (0)
- $r3 has 4 least significant bytes of increement (0x40000000)

- Load from address at $r12 'long long' into r0 and r1
  $r0 will get 4 most significant bytes (i.e 4 bytes at $r12)
  $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4)
  loads are big endian, so $r0 and $r0 will be read with proper be swap

   0x00000044 <+16>:    ldrexd    r0, [r12]

- add $r3 to $r1 store result into $r1, carry flag could be set
- adding 4 least sginificant bytes of 'long long'

   0x00000048 <+20>:    adds    r1, r1, r3

- adding with carry most siginificant parts of increment and counter

   0x0000004c <+24>:    adc    r0, r0, r2

- Store result back
  $r0 will to $r12 address
  $r1 will to $r12+4 address

   0x00000050 <+28>:    strexd    r4, r0, [r12]

Ldrd/strd
---------

Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons
in big endian mode will do byteswap only within 4 bytes boundary, but it will
not swap registers numbers itself. I.e ldrexd rN, <addr> puts swapped 4 bytes
at <addr> address into rN, and it puts swapped 4 bytes at <addr+4> into rN+1

Compiler?
---------

One may thing whether we have compiler bug here. And after all, what is the
meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what
I got. Please look at

https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c

and search for >'Q', 'R' and 'H'<

according to this page '%HN' is always reg+1 operand regardless whether it is
big endian system or not. It is opposite to 'Q'.

Better fix
----------

In fact now I believe correct fix should use 'Q' for least siginificant 4
bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store
instructions.

It should look something like this:

static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
"    adds    %Q0, %Q0, %Q4\n"
"    adc    %R0, %R0, %R4\n"
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}

It is better that my original fix because it does not have conditional
compilation. I've tested it in the module, it works on both LE and BE kernel
variants.

Thanks,
Victor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 18:03     ` Victor Kamensky
@ 2013-07-23 18:30       ` Ben Dooks
  2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  7:31         ` Victor Kamensky
  0 siblings, 2 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-23 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 19:03, Victor Kamensky wrote:
> Hi Ben,
>
> Sorry for multiple post. I am new to this, and I could not get mailing
> list to accept my original email.
> For sake of others so people can see discussed patch, here it goes
> again with patch file inlined
> now. Hope it will this time
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h
>
> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes
> for few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches its very well.
>
> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
> with SystemTap at some point of patch version on Pandaboard ES (not
> thumb mode though),
> also it may deteriorated while ported across different kernel version,
> good idea to
> rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.
>
> Thanks,
> Victor

First notes, please always format your emails (where possible) to
under 80 characters per line so that they do not get mangled when
viewed on limited screens (for example, I often read email via a
24x80 terminal)

Second note, when producing patches, it is better to try and split
them into a series of patches to reach a goal than just one large
patch. It keeps a lot of useful information (see patch comments)
and it means that if there is a problem, then bits can be lifted
out to check.


> Index: linux-linaro-tracking/arch/arm/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/Makefile
> +++ linux-linaro-tracking/arch/arm/Makefile
> @@ -16,6 +16,7 @@ LDFLAGS        :=
>   LDFLAGS_vmlinux    :=-p --no-undefined -X
>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>   LDFLAGS_vmlinux    += --be8
> +KBUILD_LDFLAGS_MODULE += --be8
>   endif
>
>   OBJCOPYFLAGS    :=-O binary -R .comment -S
> @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
>   KBUILD_CFLAGS    +=-fstack-protector
>   endif
>
> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
> +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
> +# good compromise
>   ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>   KBUILD_CPPFLAGS    += -mbig-endian
> +KBUILD_CFLAGS    += -mbig-endian
>   AS        += -EB
>   LD        += -EB
>   else
>   KBUILD_CPPFLAGS    += -mlittle-endian
> +KBUILD_CFLAGS    += -mlittle-endian
>   AS        += -EL
>   LD        += -EL
>   endif

See the comment below.

> Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
> +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
> @@ -138,6 +138,24 @@ endif
>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>   LDFLAGS_vmlinux += --be8
>   endif
> +
> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
> +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
> +# good compromise
> +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
> +KBUILD_CPPFLAGS    += -mbig-endian
> +KBUILD_CFLAGS    += -mbig-endian
> +AS        += -EB
> +LD        += -EB
> +else
> +KBUILD_CPPFLAGS    += -mlittle-endian
> +KBUILD_CFLAGS    += -mlittle-endian
> +AS        += -EL
> +LD        += -EL
> +endif
> +
>   # ?
>   LDFLAGS_vmlinux += -p
>   # Report unresolved symbol references

I've not been bitten by this one. Is there anyone else who can comment
on KBUILD_CFLAGS vs KBUILD_CPPFLAGS?

I had assumed that arch/arm/boot/compressed/Makefile would have
inherited the flags from the arch/arm/Makefile. Someone else would
be better qualified to comment on this.

> Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
> +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
> @@ -141,6 +141,12 @@ start:
>   #endif
>           mov    r7, r1            @ save architecture ID
>           mov    r8, r2            @ save atags pointer
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +                /*
> +                 * Switch to bigendian mode
> +                 */
> +                setend  be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>
>   #ifndef __ARM_ARCH_2__
>           /*
> Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
> +++ linux-linaro-tracking/arch/arm/include/asm/setup.h
> @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
>   extern void early_print(const char *str, ...);
>   extern void dump_machine_table(void);
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +void byteswap_tags(struct tag *t);
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
> +
>   #endif

You could have done

#ifdef CONFIG_LITTLE_ENDIAN_LOADER
void byteswap_tags(struct tag *t);
#else
static inline void byteswap_tags(struct tag *t) { }
#endif /* CONFIG_LITTLE_ENDIAN_LOADER */


> Index: linux-linaro-tracking/arch/arm/kernel/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
> +++ linux-linaro-tracking/arch/arm/kernel/Makefile
> @@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
>   obj-$(CONFIG_ATAGS)        += atags_parse.o
>   obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
>   obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
> +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o
>
>   obj-$(CONFIG_OC_ETM)        += etm.o
>   obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
> Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
> ===================================================================
> --- /dev/null
> +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
> @@ -0,0 +1,119 @@
> +#include<linux/slab.h>
> +#include<linux/proc_fs.h>
> +#include<linux/swab.h>
> +#include<asm/setup.h>
> +#include<asm/types.h>
> +#include<asm/page.h>
> +
> +#define byteswap32(x) (x) = __swab32((x))
> +#define byteswap16(x) (x) = __swab16((x))
> +
> +static void __init byteswap_tag_core(struct tag *tag)
> +{
> +  byteswap32(tag->u.core.flags);
> +  byteswap32(tag->u.core.pagesize);
> +  byteswap32(tag->u.core.rootdev);
> +}
> +
> +static void __init byteswap_tag_mem32(struct tag *tag)
> +{
> +  byteswap32(tag->u.mem.size);
> +  byteswap32(tag->u.mem.start);
> +}
> +
> +static void __init byteswap_tag_videotext(struct tag *tag)
> +{
> +  byteswap16(tag->u.videotext.video_page);
> +  byteswap16(tag->u.videotext.video_ega_bx);
> +  byteswap16(tag->u.videotext.video_points);
> +}
> +
> +static void __init byteswap_tag_ramdisk(struct tag *tag)
> +{
> +  byteswap32(tag->u.ramdisk.flags);
> +  byteswap32(tag->u.ramdisk.size);
> +  byteswap32(tag->u.ramdisk.start);
> +}
> +
> +static void __init byteswap_tag_initrd(struct tag *tag)
> +{
> +  byteswap32(tag->u.initrd.start);
> +  byteswap32(tag->u.initrd.size);
> +}
> +
> +static void __init byteswap_tag_serialnr(struct tag *tag)
> +{
> +  byteswap32(tag->u.serialnr.low);
> +  byteswap32(tag->u.serialnr.high);
> +}
> +
> +static void __init byteswap_tag_revision(struct tag *tag)
> +{
> +    byteswap32(tag->u.revision.rev);
> +}
> +
> +static void __init byteswap_tag_videolfb(struct tag *tag)
> +{
> +    byteswap16(tag->u.videolfb.lfb_width);
> +    byteswap16(tag->u.videolfb.lfb_height);
> +    byteswap16(tag->u.videolfb.lfb_depth);
> +    byteswap16(tag->u.videolfb.lfb_linelength);
> +    byteswap32(tag->u.videolfb.lfb_base);
> +    byteswap32(tag->u.videolfb.lfb_size);
> +}
> +
> +static void __init byteswap_tag_acorn(struct tag *tag)
> +{
> +    byteswap32(tag->u.acorn.memc_control_reg);
> +    byteswap32(tag->u.acorn.vram_pages);
> +}
> +
> +static void __init byteswap_tag_memclk(struct tag *tag)
> +{
> +    byteswap32(tag->u.memclk.fmemclk);
> +}
> +
> +void __init byteswap_tags(struct tag *t)
> +{
> +  for (; t->hdr.size; t = tag_next(t)) {
> +    byteswap32(t->hdr.size);
> +    byteswap32(t->hdr.tag);
> +    switch(t->hdr.tag) {
> +    case ATAG_CORE:
> +      byteswap_tag_core(t);
> +      break;
> +    case ATAG_MEM:
> +      byteswap_tag_mem32(t);
> +      break;
> +    case ATAG_VIDEOTEXT:
> +      byteswap_tag_videotext(t);
> +      break;
> +    case ATAG_RAMDISK:
> +      byteswap_tag_ramdisk(t);
> +      break;
> +    case ATAG_INITRD2:
> +      byteswap_tag_initrd(t);
> +      break;
> +    case ATAG_SERIAL:
> +      byteswap_tag_serialnr(t);
> +      break;
> +    case ATAG_REVISION:
> +      byteswap_tag_revision(t);
> +      break;
> +    case ATAG_VIDEOLFB:
> +      byteswap_tag_videolfb(t);
> +      break;
> +    case ATAG_CMDLINE:
> +      break;
> +    case ATAG_ACORN:
> +      byteswap_tag_acorn(t);
> +      break;
> +    case ATAG_MEMCLK:
> +      byteswap_tag_memclk(t);
> +      break;
> +    default:
> +      /* Need to complain? */
> +      break;
> +    }
> +  }
> +}

I think swapping on read is a better idea, as it leaves
these in their original format.

PS, looks like you either have a formatting issue with
your email client or your code. Did this pass checkpatch?

> Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
> +++ linux-linaro-tracking/arch/arm/kernel/head-common.S
> @@ -48,6 +48,9 @@ __vet_atags:
>       bne    1f
>
>       ldr    r5, [r2, #0]
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +        rev     r5, r5
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>   #ifdef CONFIG_OF_FLATTREE
>       ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
>       cmp    r5, r6
> @@ -57,6 +60,9 @@ __vet_atags:
>       cmpne    r5, #ATAG_CORE_SIZE_EMPTY
>       bne    1f
>       ldr    r5, [r2, #4]
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +        rev     r5, r5
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       ldr    r6, =ATAG_CORE
>       cmp    r5, r6
>       bne    1f

bit-untidy. also, does OF_DT_MAGIC get changed by the
endian-ness, or is that somewhere else in the head code?

> Index: linux-linaro-tracking/arch/arm/kernel/module.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/module.c
> +++ linux-linaro-tracking/arch/arm/kernel/module.c
> @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
>   }
>   #endif
>
> +/*
> + * For relocations in exectuable sections if we configured with
> + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
> + * form already and we need to byteswap value when we read and when we
> + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for
> + * data piece and we do not perform byteswaps when such relocation applied.
> + * This herustic based approach may break in future.
> + *
> + * It seems that more correct way to handle be8 in kernel module is
> + * to build module without --be8 option, but perform instruction byteswap
> + * when module sections are loaded, after all relocation are applied.
> + * In order to do such byteswap we need to read all mappings symbols ($a,
> + * $d, $t), sort them, and apply byteswaps for $a and $t entries.
> + */
> +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
> +{
> +       u32 retval = *(u32 *)loc;
> +
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                retval = __swab32(retval);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        return retval;
> +}
> +
> +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
> long loc, u32 value)
> +{
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                value = __swab32(value);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        *(u32 *) loc = value;
> +}
> +
> +#ifdef CONFIG_THUMB2_KERNEL
> +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
> +{
> +        u16 retval = *(u16 *) loc;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                retval = __swab16(retval);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        return retval;
> +}
> +
> +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
> long loc, u16 value)
> +{
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                value = __swab16(value);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        *(u16 *) loc = value;
> +}
> +#endif /* CONFIG_THUMB2_KERNEL */
> +
>   int
>   apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>              unsigned int relindex, struct module *module)
> @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>           Elf32_Sym *sym;
>           const char *symname;
>           s32 offset;
> +                u32 loc_u32;
>   #ifdef CONFIG_THUMB2_KERNEL
>           u32 upper, lower, sign, j1, j2;
>   #endif
> @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>           case R_ARM_PC24:
>           case R_ARM_CALL:
>           case R_ARM_JUMP24:
> -            offset = (*(u32 *)loc&  0x00ffffff)<<  2;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = (loc_u32&  0x00ffffff)<<  2;
>               if (offset&  0x02000000)
>                   offset -= 0x04000000;
>
> @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>
>               offset>>= 2;
>
> -            *(u32 *)loc&= 0xff000000;
> -            *(u32 *)loc |= offset&  0x00ffffff;
> +            loc_u32&= 0xff000000;
> +            loc_u32 |= offset&  0x00ffffff;
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
> -           case R_ARM_V4BX:
> +                case R_ARM_V4BX:
> +                        loc_u32 = read_section_u32(dstsec, loc);
>                  /* Preserve Rm and the condition code. Alter
>               * other bits to re-code instruction as
>               * MOV PC,Rm.
>               */
> -               *(u32 *)loc&= 0xf000000f;
> -               *(u32 *)loc |= 0x01a0f000;
> +               loc_u32&= 0xf000000f;
> +               loc_u32 |= 0x01a0f000;
> +                       write_section_u32(dstsec, loc, loc_u32);
>                  break;
>
>           case R_ARM_PREL31:
> -            offset = *(u32 *)loc + sym->st_value - loc;
> -            *(u32 *)loc = offset&  0x7fffffff;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = loc_u32 + sym->st_value - loc;
> +            loc_u32 = offset&  0x7fffffff;
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
>           case R_ARM_MOVW_ABS_NC:
>           case R_ARM_MOVT_ABS:
> -            offset = *(u32 *)loc;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = loc_u32;
>               offset = ((offset&  0xf0000)>>  4) | (offset&  0xfff);
>               offset = (offset ^ 0x8000) - 0x8000;
>
> @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
>                   offset>>= 16;
>
> -            *(u32 *)loc&= 0xfff0f000;
> -            *(u32 *)loc |= ((offset&  0xf000)<<  4) |
> -                    (offset&  0x0fff);
> +            loc_u32&= 0xfff0f000;
> +            loc_u32 |= ((offset&  0xf000)<<  4) |
> +                                   (offset&  0x0fff);
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
>   #ifdef CONFIG_THUMB2_KERNEL
>           case R_ARM_THM_CALL:
>           case R_ARM_THM_JUMP24:
> -            upper = *(u16 *)loc;
> -            lower = *(u16 *)(loc + 2);
> +                        upper = read_section_u16(dstsec, loc);
> +            lower = read_section_u16(dstsec, loc + 2);
>
>               /*
>                * 25 bit signed address range (Thumb-2 BL and B.W
> @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               sign = (offset>>  24)&  1;
>               j1 = sign ^ (~(offset>>  23)&  1);
>               j2 = sign ^ (~(offset>>  22)&  1);
> -            *(u16 *)loc = (u16)((upper&  0xf800) | (sign<<  10) |
> -                        ((offset>>  12)&  0x03ff));
> -            *(u16 *)(loc + 2) = (u16)((lower&  0xd000) |
> -                          (j1<<  13) | (j2<<  11) |
> -                          ((offset>>  1)&  0x07ff));
> +                        write_section_u16(dstsec, loc,
> +                                          (u16)((upper&  0xf800) |
> (sign<<  10) |
> +                                                ((offset>>  12)&  0x03ff)));
> +                        write_section_u16(dstsec, loc + 2,
> +                                          (u16)((lower&  0xd000) |
> +                                                (j1<<  13) | (j2<<  11) |
> +                                                ((offset>>  1)&  0x07ff)));
>               break;
>
>           case R_ARM_THM_MOVW_ABS_NC:
>           case R_ARM_THM_MOVT_ABS:
> -            upper = *(u16 *)loc;
> -            lower = *(u16 *)(loc + 2);
> +                        upper = read_section_u16(dstsec, loc);
> +            lower = read_section_u16(dstsec, loc + 2);
>
>               /*
>                * MOVT/MOVW instructions encoding in Thumb-2:
> @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
>                   offset>>= 16;
>
> -            *(u16 *)loc = (u16)((upper&  0xfbf0) |
> -                        ((offset&  0xf000)>>  12) |
> -                        ((offset&  0x0800)>>  1));
> -            *(u16 *)(loc + 2) = (u16)((lower&  0x8f00) |
> -                          ((offset&  0x0700)<<  4) |
> -                          (offset&  0x00ff));
> +                        write_section_u16(dstsec, loc,
> +                                          (u16)((upper&  0xfbf0) |
> +                                                ((offset&  0xf000)>>  12) |
> +                                                ((offset&  0x0800)>>  1)));
> +                        write_section_u16(dstsec, loc + 2,
> +                                          (u16)((lower&  0x8f00) |
> +                                                ((offset&  0x0700)<<  4) |
> +                                                (offset&  0x00ff)));
>               break;
>   #endif
>

I'm still not sure about this, I think the use of <asm/opcodes.h>
is better here. Not sure if there's much point in checking the flags
on the sections as I think the relocations pretty much define which
endian they will be in.

> Index: linux-linaro-tracking/arch/arm/mm/Kconfig
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
> +++ linux-linaro-tracking/arch/arm/mm/Kconfig
> @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
>       help
>         Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.
>
> +config LITTLE_ENDIAN_LOADER
> +        bool
> +        depends on CPU_BIG_ENDIAN
> +        default y
> +        help
> +          If Linux should operate in big-endian mode, but bootloader
> (i.e u-boot)
> +          is still in little endian mode and passes boot information in little
> +          endian form set this flag
> +
>   config CPU_HIGH_VECTOR
>       depends on !MMU&&  CPU_CP15&&  !CPU_ARM740T
>       bool "Select the High exception vector"


> Index: linux-linaro-tracking/arch/arm/mm/alignment.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
> +++ linux-linaro-tracking/arch/arm/mm/alignment.c
> @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne
>
>       regs->ARM_pc += isize;
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        instr = __swab32(instr); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       switch (CODING_BITS(instr)) {
>       case 0x00000000:    /* 3.13.4 load/store instruction extensions */
>           if (LDSTHD_I_BIT(instr))

See comments on <asm/opcodes.h>

> Index: linux-linaro-tracking/arch/arm/kernel/head.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/head.S
> +++ linux-linaro-tracking/arch/arm/kernel/head.S
> @@ -89,6 +89,12 @@ ENTRY(stext)
>       @ ensure svc mode and all interrupts masked
>       safe_svcmode_maskall r9
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    /*
> +    * Switch to bigendian mode
> +    */
> +    setend    be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       mrc    p15, 0, r9, c0, c0        @ get processor id
>       bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
>       movs    r10, r5                @ invalid processor (r5=0)?
> @@ -356,6 +362,12 @@ ENTRY(secondary_startup)
>   #endif
>       safe_svcmode_maskall r9
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    /*
> +    * Switch to bigendian mode
> +    */
> +    setend    be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       mrc    p15, 0, r9, c0, c0        @ get processor id
>       bl    __lookup_processor_type
>       movs    r10, r5                @ invalid processor?
> @@ -427,6 +439,9 @@ __enable_mmu:
>   #ifdef CONFIG_CPU_ICACHE_DISABLE
>       bic    r0, r0, #CR_I
>   #endif
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +    orr    r0, r0, #CR_EE    @ big-endian page tables
> +#endif
>   #ifdef CONFIG_ARM_LPAE
>       mov    r5, #0
>       mcrr    p15, 0, r4, r5, c2        @ load TTBR0
> @@ -592,8 +607,14 @@ __fixup_a_pv_table:
>       b    2f
>   1:    add     r7, r3
>       ldrh    ip, [r7, #2]
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       and    ip, 0x8f00
>       orr    ip, r6    @ mask in offset bits 31-24
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       strh    ip, [r7, #2]
>   2:    cmp    r4, r5
>       ldrcc    r7, [r4], #4    @ use branch for delay slot
> @@ -602,8 +623,14 @@ __fixup_a_pv_table:
>   #else
>       b    2f
>   1:    ldr    ip, [r7, r3]
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       bic    ip, ip, #0x000000ff
>       orr    ip, ip, r6    @ mask in offset bits 31-24
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       str    ip, [r7, r3]
>   2:    cmp    r4, r5
>       ldrcc    r7, [r4], #4    @ use branch for delay slot

Yeah, prefer my macros for doing this.


> Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
> +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
> @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
>       if (is_wide_instruction(insn)) {
>           insn<<= 16;
>           insn |= ((u16 *)addr)[1];
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                insn = __swab32(insn); /* little endian instruction */
> +#endif
>           decode_insn = thumb32_kprobe_decode_insn;
> -    } else
> -        decode_insn = thumb16_kprobe_decode_insn;
> +    } else {
> +                decode_insn = thumb16_kprobe_decode_insn;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                insn = __swab16(insn); /* little endian instruction */
> +#endif
> +        }
>   #else /* !CONFIG_THUMB2_KERNEL */
>       thumb = false;
>       if (addr&  0x3)
>           return -EINVAL;
>       insn = *p->addr;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        insn = __swab32(insn); /* little endian instruction */
> +#endif
>       decode_insn = arm_kprobe_decode_insn;
>   #endif
>
> @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
>           if (!p->ainsn.insn)
>               return -ENOMEM;
>           for (is = 0; is<  MAX_INSN_SIZE; ++is)
> +#ifndef CONFIG_CPU_ENDIAN_BE8
>               p->ainsn.insn[is] = tmp_insn[is];
> +#else
> +                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
> /* little endian instruction */
> +#endif
>           flush_insns(p->ainsn.insn,
>                   sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
>           p->ainsn.insn_fn = (kprobe_insn_fn_t *)
> @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
>           /* Remove any Thumb flag */
>           addr = (void *)((uintptr_t)p->addr&  ~1);
>
> -        if (is_wide_instruction(p->opcode))
> +        if (is_wide_instruction(p->opcode)) {
>               brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
> -        else
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                brkp = __swab32(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        } else {
>               brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                brkp = __swab16(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +            }
>       } else {
>           kprobe_opcode_t insn = p->opcode;
>
> @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
>               brkp |= 0xe0000000;  /* Unconditional instruction */
>           else
>               brkp |= insn&  0xf0000000;  /* Copy condition from insn */
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        brkp = __swab32(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       }
>
>       patch_text(addr, brkp);
> @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
>   {
>       struct kprobe *kp = p;
>       void *addr = (void *)((uintptr_t)kp->addr&  ~1);
> +    kprobe_opcode_t insn = kp->opcode;
>
> -    __patch_text(addr, kp->opcode);
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        insn = __swab32(insn); /* little endian instruction */
> +#endif
> +    __patch_text(addr, insn);
>
>       return 0;
>

__patch_text already does swapping, so a little concerned that it
is being done twice in some places.


> Index: linux-linaro-tracking/arch/arm/kernel/traps.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
> +++ linux-linaro-tracking/arch/arm/kernel/traps.c
> @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
>           goto die_sig;
>       }
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        instr = __swab32(instr); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       if (call_undef_hook(regs, instr) == 0)
>           return;

already fixed.

> Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
> +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
> @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
>           if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
>               return -EFAULT;
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        replaced = __swab32(replaced); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>           if (replaced != old)
>               return -EINVAL;
>       }
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        new = __swab32(new); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE))
>           return -EPERM;

Please see <asm/opcodes.h> for dealing with the op-codes like this.

> Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
> +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
> @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
>       if (tags->hdr.tag != ATAG_CORE)
>           convert_to_tag_list(tags);
>   #endif
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    byteswap_tags(tags);
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
> +
>       if (tags->hdr.tag != ATAG_CORE) {
>           early_print("Warning: Neither atags nor dtb found\n");
>           tags = (struct tag *)&default_tags;
> Index: linux-linaro-tracking/arch/arm/kernel/sleep.S

I really don't like this way, it makes it difficult to deal with
if you export this data to user-space to then do things like md5sum
it. Also, if you're going to kexec() a new kernel, would you have
to re-swap it back before this?

> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>
>       __asm__ __volatile__("@ atomic64_add\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %4\n"
>   "    adc    %H0, %H0, %H4\n"
> +#else
> +"    adds    %H0, %H0, %H4\n"
> +"    adc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>
>       __asm__ __volatile__("@ atomic64_add_return\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %4\n"
>   "    adc    %H0, %H0, %H4\n"
> +#else
> +"    adds    %H0, %H0, %H4\n"
> +"    adc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>
>       __asm__ __volatile__("@ atomic64_sub\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, %4\n"
>   "    sbc    %H0, %H0, %H4\n"
> +#else
> +"    subs    %H0, %H0, %H4\n"
> +"    sbc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>
>       __asm__ __volatile__("@ atomic64_sub_return\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, %4\n"
>   "    sbc    %H0, %H0, %H4\n"
> +#else
> +"    subs    %H0, %H0, %H4\n"
> +"    sbc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>
>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, #1\n"
>   "    sbc    %H0, %H0, #0\n"
>   "    teq    %H0, #0\n"
> +#else
> +"    subs    %H0, %H0, #1\n"
> +"    sbc    %0, %0, #0\n"
> +"    teq    %0, #0\n"
> +#endif
>   "    bmi    2f\n"
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>   "    teqeq    %H0, %H5\n"
>   "    moveq    %1, #0\n"
>   "    beq    2f\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %6\n"
>   "    adc    %H0, %H0, %H6\n"
> +#else
> +"    adds    %H0, %H0, %H6\n"
> +"    adc    %0, %0, %6\n"
> +#endif
>   "    strexd    %2, %0, %H0, [%4]\n"
>   "    teq    %2, #0\n"
>   "    bne    1b\n"

I still believe that you're doing the wrong thing here
and that these can be left well enough alone. Please give
a concrete piece of code that can show they're actually
doing the wrong thing.



-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 17:53   ` Victor Kamensky
  2013-07-23 18:02     ` Ben Dooks
@ 2013-07-23 18:03     ` Victor Kamensky
  2013-07-23 18:30       ` Ben Dooks
  1 sibling, 1 reply; 29+ messages in thread
From: Victor Kamensky @ 2013-07-23 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Sorry for multiple post. I am new to this, and I could not get mailing
list to accept my original email.
For sake of others so people can see discussed patch, here it goes
again with patch file inlined
now. Hope it will this time

Wrt BE8 little endian instructions you will need to fix couple more places:
    ftrace arch/arm/kernel/ftrace.c
    kprobes arch/arm/kernel/kprobes.c
Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
counters will be truncated on max of 32bit values
    atomic64 arch/arm/include/asm/atomic.h

I've attached board independent (core) patch from my work that made
Pandaboard ES
kernel run in BE mode. You can check my version of fixes for above issues in the
patch. Look for corresponding files changes. Please rework them
according to style
and rules of your patch series. Note the patch itself contains fixes
for few other
issues that already fixed in your series. I'll cross check and compare
individual
areas latter. I think you may find attached patch interesting as well,
it was developed
independent from yours but matches its very well.

Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
with SystemTap at some point of patch version on Pandaboard ES (not
thumb mode though),
also it may deteriorated while ported across different kernel version,
good idea to
rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.

Thanks,
Victor

Index: linux-linaro-tracking/arch/arm/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/Makefile
+++ linux-linaro-tracking/arch/arm/Makefile
@@ -16,6 +16,7 @@ LDFLAGS        :=
 LDFLAGS_vmlinux    :=-p --no-undefined -X
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux    += --be8
+KBUILD_LDFLAGS_MODULE += --be8
 endif

 OBJCOPYFLAGS    :=-O binary -R .comment -S
@@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
 KBUILD_CFLAGS    +=-fstack-protector
 endif

+# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
+# places where CFLAGS alone are used (cmd_record_mcount). And it has to
+# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
+# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
+# good compromise
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
 KBUILD_CPPFLAGS    += -mbig-endian
+KBUILD_CFLAGS    += -mbig-endian
 AS        += -EB
 LD        += -EB
 else
 KBUILD_CPPFLAGS    += -mlittle-endian
+KBUILD_CFLAGS    += -mlittle-endian
 AS        += -EL
 LD        += -EL
 endif
Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
+++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
@@ -138,6 +138,24 @@ endif
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux += --be8
 endif
+
+# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
+# places where CFLAGS alone are used (cmd_record_mcount). And it has to
+# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
+# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
+# good compromise
+ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
+KBUILD_CPPFLAGS    += -mbig-endian
+KBUILD_CFLAGS    += -mbig-endian
+AS        += -EB
+LD        += -EB
+else
+KBUILD_CPPFLAGS    += -mlittle-endian
+KBUILD_CFLAGS    += -mlittle-endian
+AS        += -EL
+LD        += -EL
+endif
+
 # ?
 LDFLAGS_vmlinux += -p
 # Report unresolved symbol references
Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
+++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
@@ -141,6 +141,12 @@ start:
 #endif
         mov    r7, r1            @ save architecture ID
         mov    r8, r2            @ save atags pointer
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+                /*
+                 * Switch to bigendian mode
+                 */
+                setend  be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */

 #ifndef __ARM_ARCH_2__
         /*
Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
===================================================================
--- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
+++ linux-linaro-tracking/arch/arm/include/asm/setup.h
@@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
 extern void early_print(const char *str, ...);
 extern void dump_machine_table(void);

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+void byteswap_tags(struct tag *t);
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
+
 #endif
Index: linux-linaro-tracking/arch/arm/kernel/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
+++ linux-linaro-tracking/arch/arm/kernel/Makefile
@@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
 obj-$(CONFIG_ATAGS)        += atags_parse.o
 obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
 obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
+obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o

 obj-$(CONFIG_OC_ETM)        += etm.o
 obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
===================================================================
--- /dev/null
+++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
@@ -0,0 +1,119 @@
+#include <linux/slab.h>
+#include <linux/proc_fs.h>
+#include <linux/swab.h>
+#include <asm/setup.h>
+#include <asm/types.h>
+#include <asm/page.h>
+
+#define byteswap32(x) (x) = __swab32((x))
+#define byteswap16(x) (x) = __swab16((x))
+
+static void __init byteswap_tag_core(struct tag *tag)
+{
+  byteswap32(tag->u.core.flags);
+  byteswap32(tag->u.core.pagesize);
+  byteswap32(tag->u.core.rootdev);
+}
+
+static void __init byteswap_tag_mem32(struct tag *tag)
+{
+  byteswap32(tag->u.mem.size);
+  byteswap32(tag->u.mem.start);
+}
+
+static void __init byteswap_tag_videotext(struct tag *tag)
+{
+  byteswap16(tag->u.videotext.video_page);
+  byteswap16(tag->u.videotext.video_ega_bx);
+  byteswap16(tag->u.videotext.video_points);
+}
+
+static void __init byteswap_tag_ramdisk(struct tag *tag)
+{
+  byteswap32(tag->u.ramdisk.flags);
+  byteswap32(tag->u.ramdisk.size);
+  byteswap32(tag->u.ramdisk.start);
+}
+
+static void __init byteswap_tag_initrd(struct tag *tag)
+{
+  byteswap32(tag->u.initrd.start);
+  byteswap32(tag->u.initrd.size);
+}
+
+static void __init byteswap_tag_serialnr(struct tag *tag)
+{
+  byteswap32(tag->u.serialnr.low);
+  byteswap32(tag->u.serialnr.high);
+}
+
+static void __init byteswap_tag_revision(struct tag *tag)
+{
+    byteswap32(tag->u.revision.rev);
+}
+
+static void __init byteswap_tag_videolfb(struct tag *tag)
+{
+    byteswap16(tag->u.videolfb.lfb_width);
+    byteswap16(tag->u.videolfb.lfb_height);
+    byteswap16(tag->u.videolfb.lfb_depth);
+    byteswap16(tag->u.videolfb.lfb_linelength);
+    byteswap32(tag->u.videolfb.lfb_base);
+    byteswap32(tag->u.videolfb.lfb_size);
+}
+
+static void __init byteswap_tag_acorn(struct tag *tag)
+{
+    byteswap32(tag->u.acorn.memc_control_reg);
+    byteswap32(tag->u.acorn.vram_pages);
+}
+
+static void __init byteswap_tag_memclk(struct tag *tag)
+{
+    byteswap32(tag->u.memclk.fmemclk);
+}
+
+void __init byteswap_tags(struct tag *t)
+{
+  for (; t->hdr.size; t = tag_next(t)) {
+    byteswap32(t->hdr.size);
+    byteswap32(t->hdr.tag);
+    switch(t->hdr.tag) {
+    case ATAG_CORE:
+      byteswap_tag_core(t);
+      break;
+    case ATAG_MEM:
+      byteswap_tag_mem32(t);
+      break;
+    case ATAG_VIDEOTEXT:
+      byteswap_tag_videotext(t);
+      break;
+    case ATAG_RAMDISK:
+      byteswap_tag_ramdisk(t);
+      break;
+    case ATAG_INITRD2:
+      byteswap_tag_initrd(t);
+      break;
+    case ATAG_SERIAL:
+      byteswap_tag_serialnr(t);
+      break;
+    case ATAG_REVISION:
+      byteswap_tag_revision(t);
+      break;
+    case ATAG_VIDEOLFB:
+      byteswap_tag_videolfb(t);
+      break;
+    case ATAG_CMDLINE:
+      break;
+    case ATAG_ACORN:
+      byteswap_tag_acorn(t);
+      break;
+    case ATAG_MEMCLK:
+      byteswap_tag_memclk(t);
+      break;
+    default:
+      /* Need to complain? */
+      break;
+    }
+  }
+}
Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
+++ linux-linaro-tracking/arch/arm/kernel/head-common.S
@@ -48,6 +48,9 @@ __vet_atags:
     bne    1f

     ldr    r5, [r2, #0]
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        rev     r5, r5
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
 #ifdef CONFIG_OF_FLATTREE
     ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
     cmp    r5, r6
@@ -57,6 +60,9 @@ __vet_atags:
     cmpne    r5, #ATAG_CORE_SIZE_EMPTY
     bne    1f
     ldr    r5, [r2, #4]
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        rev     r5, r5
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     ldr    r6, =ATAG_CORE
     cmp    r5, r6
     bne    1f
Index: linux-linaro-tracking/arch/arm/kernel/module.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/module.c
+++ linux-linaro-tracking/arch/arm/kernel/module.c
@@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
 }
 #endif

+/*
+ * For relocations in exectuable sections if we configured with
+ * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
+ * form already and we need to byteswap value when we read and when we
+ * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for
+ * data piece and we do not perform byteswaps when such relocation applied.
+ * This herustic based approach may break in future.
+ *
+ * It seems that more correct way to handle be8 in kernel module is
+ * to build module without --be8 option, but perform instruction byteswap
+ * when module sections are loaded, after all relocation are applied.
+ * In order to do such byteswap we need to read all mappings symbols ($a,
+ * $d, $t), sort them, and apply byteswaps for $a and $t entries.
+ */
+static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
+{
+       u32 retval = *(u32 *)loc;
+
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                retval = __swab32(retval);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        return retval;
+}
+
+static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
long loc, u32 value)
+{
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                value = __swab32(value);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        *(u32 *) loc = value;
+}
+
+#ifdef CONFIG_THUMB2_KERNEL
+static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
+{
+        u16 retval = *(u16 *) loc;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                retval = __swab16(retval);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        return retval;
+}
+
+static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
long loc, u16 value)
+{
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                value = __swab16(value);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        *(u16 *) loc = value;
+}
+#endif /* CONFIG_THUMB2_KERNEL */
+
 int
 apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
            unsigned int relindex, struct module *module)
@@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
         Elf32_Sym *sym;
         const char *symname;
         s32 offset;
+                u32 loc_u32;
 #ifdef CONFIG_THUMB2_KERNEL
         u32 upper, lower, sign, j1, j2;
 #endif
@@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
         case R_ARM_PC24:
         case R_ARM_CALL:
         case R_ARM_JUMP24:
-            offset = (*(u32 *)loc & 0x00ffffff) << 2;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = (loc_u32 & 0x00ffffff) << 2;
             if (offset & 0x02000000)
                 offset -= 0x04000000;

@@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons

             offset >>= 2;

-            *(u32 *)loc &= 0xff000000;
-            *(u32 *)loc |= offset & 0x00ffffff;
+            loc_u32 &= 0xff000000;
+            loc_u32 |= offset & 0x00ffffff;
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

-           case R_ARM_V4BX:
+                case R_ARM_V4BX:
+                        loc_u32 = read_section_u32(dstsec, loc);
                /* Preserve Rm and the condition code. Alter
             * other bits to re-code instruction as
             * MOV PC,Rm.
             */
-               *(u32 *)loc &= 0xf000000f;
-               *(u32 *)loc |= 0x01a0f000;
+               loc_u32 &= 0xf000000f;
+               loc_u32 |= 0x01a0f000;
+                       write_section_u32(dstsec, loc, loc_u32);
                break;

         case R_ARM_PREL31:
-            offset = *(u32 *)loc + sym->st_value - loc;
-            *(u32 *)loc = offset & 0x7fffffff;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = loc_u32 + sym->st_value - loc;
+            loc_u32 = offset & 0x7fffffff;
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

         case R_ARM_MOVW_ABS_NC:
         case R_ARM_MOVT_ABS:
-            offset = *(u32 *)loc;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = loc_u32;
             offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff);
             offset = (offset ^ 0x8000) - 0x8000;

@@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
                 offset >>= 16;

-            *(u32 *)loc &= 0xfff0f000;
-            *(u32 *)loc |= ((offset & 0xf000) << 4) |
-                    (offset & 0x0fff);
+            loc_u32 &= 0xfff0f000;
+            loc_u32 |= ((offset & 0xf000) << 4) |
+                                   (offset & 0x0fff);
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

 #ifdef CONFIG_THUMB2_KERNEL
         case R_ARM_THM_CALL:
         case R_ARM_THM_JUMP24:
-            upper = *(u16 *)loc;
-            lower = *(u16 *)(loc + 2);
+                        upper = read_section_u16(dstsec, loc);
+            lower = read_section_u16(dstsec, loc + 2);

             /*
              * 25 bit signed address range (Thumb-2 BL and B.W
@@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             sign = (offset >> 24) & 1;
             j1 = sign ^ (~(offset >> 23) & 1);
             j2 = sign ^ (~(offset >> 22) & 1);
-            *(u16 *)loc = (u16)((upper & 0xf800) | (sign << 10) |
-                        ((offset >> 12) & 0x03ff));
-            *(u16 *)(loc + 2) = (u16)((lower & 0xd000) |
-                          (j1 << 13) | (j2 << 11) |
-                          ((offset >> 1) & 0x07ff));
+                        write_section_u16(dstsec, loc,
+                                          (u16)((upper & 0xf800) |
(sign << 10) |
+                                                ((offset >> 12) & 0x03ff)));
+                        write_section_u16(dstsec, loc + 2,
+                                          (u16)((lower & 0xd000) |
+                                                (j1 << 13) | (j2 << 11) |
+                                                ((offset >> 1) & 0x07ff)));
             break;

         case R_ARM_THM_MOVW_ABS_NC:
         case R_ARM_THM_MOVT_ABS:
-            upper = *(u16 *)loc;
-            lower = *(u16 *)(loc + 2);
+                        upper = read_section_u16(dstsec, loc);
+            lower = read_section_u16(dstsec, loc + 2);

             /*
              * MOVT/MOVW instructions encoding in Thumb-2:
@@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
                 offset >>= 16;

-            *(u16 *)loc = (u16)((upper & 0xfbf0) |
-                        ((offset & 0xf000) >> 12) |
-                        ((offset & 0x0800) >> 1));
-            *(u16 *)(loc + 2) = (u16)((lower & 0x8f00) |
-                          ((offset & 0x0700) << 4) |
-                          (offset & 0x00ff));
+                        write_section_u16(dstsec, loc,
+                                          (u16)((upper & 0xfbf0) |
+                                                ((offset & 0xf000) >> 12) |
+                                                ((offset & 0x0800) >> 1)));
+                        write_section_u16(dstsec, loc + 2,
+                                          (u16)((lower & 0x8f00) |
+                                                ((offset & 0x0700) << 4) |
+                                                (offset & 0x00ff)));
             break;
 #endif

Index: linux-linaro-tracking/arch/arm/mm/Kconfig
===================================================================
--- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
+++ linux-linaro-tracking/arch/arm/mm/Kconfig
@@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
     help
       Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.

+config LITTLE_ENDIAN_LOADER
+        bool
+        depends on CPU_BIG_ENDIAN
+        default y
+        help
+          If Linux should operate in big-endian mode, but bootloader
(i.e u-boot)
+          is still in little endian mode and passes boot information in little
+          endian form set this flag
+
 config CPU_HIGH_VECTOR
     depends on !MMU && CPU_CP15 && !CPU_ARM740T
     bool "Select the High exception vector"
Index: linux-linaro-tracking/arch/arm/mm/alignment.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
+++ linux-linaro-tracking/arch/arm/mm/alignment.c
@@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne

     regs->ARM_pc += isize;

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        instr = __swab32(instr); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     switch (CODING_BITS(instr)) {
     case 0x00000000:    /* 3.13.4 load/store instruction extensions */
         if (LDSTHD_I_BIT(instr))
Index: linux-linaro-tracking/arch/arm/kernel/head.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/head.S
+++ linux-linaro-tracking/arch/arm/kernel/head.S
@@ -89,6 +89,12 @@ ENTRY(stext)
     @ ensure svc mode and all interrupts masked
     safe_svcmode_maskall r9

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    /*
+    * Switch to bigendian mode
+    */
+    setend    be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     mrc    p15, 0, r9, c0, c0        @ get processor id
     bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
     movs    r10, r5                @ invalid processor (r5=0)?
@@ -356,6 +362,12 @@ ENTRY(secondary_startup)
 #endif
     safe_svcmode_maskall r9

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    /*
+    * Switch to bigendian mode
+    */
+    setend    be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     mrc    p15, 0, r9, c0, c0        @ get processor id
     bl    __lookup_processor_type
     movs    r10, r5                @ invalid processor?
@@ -427,6 +439,9 @@ __enable_mmu:
 #ifdef CONFIG_CPU_ICACHE_DISABLE
     bic    r0, r0, #CR_I
 #endif
+#ifdef CONFIG_CPU_ENDIAN_BE8
+    orr    r0, r0, #CR_EE    @ big-endian page tables
+#endif
 #ifdef CONFIG_ARM_LPAE
     mov    r5, #0
     mcrr    p15, 0, r4, r5, c2        @ load TTBR0
@@ -592,8 +607,14 @@ __fixup_a_pv_table:
     b    2f
 1:    add     r7, r3
     ldrh    ip, [r7, #2]
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     and    ip, 0x8f00
     orr    ip, r6    @ mask in offset bits 31-24
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     strh    ip, [r7, #2]
 2:    cmp    r4, r5
     ldrcc    r7, [r4], #4    @ use branch for delay slot
@@ -602,8 +623,14 @@ __fixup_a_pv_table:
 #else
     b    2f
 1:    ldr    ip, [r7, r3]
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     bic    ip, ip, #0x000000ff
     orr    ip, ip, r6    @ mask in offset bits 31-24
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     str    ip, [r7, r3]
 2:    cmp    r4, r5
     ldrcc    r7, [r4], #4    @ use branch for delay slot
Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
+++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
@@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
     if (is_wide_instruction(insn)) {
         insn <<= 16;
         insn |= ((u16 *)addr)[1];
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                insn = __swab32(insn); /* little endian instruction */
+#endif
         decode_insn = thumb32_kprobe_decode_insn;
-    } else
-        decode_insn = thumb16_kprobe_decode_insn;
+    } else {
+                decode_insn = thumb16_kprobe_decode_insn;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                insn = __swab16(insn); /* little endian instruction */
+#endif
+        }
 #else /* !CONFIG_THUMB2_KERNEL */
     thumb = false;
     if (addr & 0x3)
         return -EINVAL;
     insn = *p->addr;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        insn = __swab32(insn); /* little endian instruction */
+#endif
     decode_insn = arm_kprobe_decode_insn;
 #endif

@@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
         if (!p->ainsn.insn)
             return -ENOMEM;
         for (is = 0; is < MAX_INSN_SIZE; ++is)
+#ifndef CONFIG_CPU_ENDIAN_BE8
             p->ainsn.insn[is] = tmp_insn[is];
+#else
+                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
/* little endian instruction */
+#endif
         flush_insns(p->ainsn.insn,
                 sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
         p->ainsn.insn_fn = (kprobe_insn_fn_t *)
@@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
         /* Remove any Thumb flag */
         addr = (void *)((uintptr_t)p->addr & ~1);

-        if (is_wide_instruction(p->opcode))
+        if (is_wide_instruction(p->opcode)) {
             brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
-        else
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                brkp = __swab32(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        } else {
             brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                brkp = __swab16(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+            }
     } else {
         kprobe_opcode_t insn = p->opcode;

@@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
             brkp |= 0xe0000000;  /* Unconditional instruction */
         else
             brkp |= insn & 0xf0000000;  /* Copy condition from insn */
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        brkp = __swab32(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     }

     patch_text(addr, brkp);
@@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
 {
     struct kprobe *kp = p;
     void *addr = (void *)((uintptr_t)kp->addr & ~1);
+    kprobe_opcode_t insn = kp->opcode;

-    __patch_text(addr, kp->opcode);
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        insn = __swab32(insn); /* little endian instruction */
+#endif
+    __patch_text(addr, insn);

     return 0;
 }
Index: linux-linaro-tracking/arch/arm/kernel/traps.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
+++ linux-linaro-tracking/arch/arm/kernel/traps.c
@@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
         goto die_sig;
     }

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        instr = __swab32(instr); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     if (call_undef_hook(regs, instr) == 0)
         return;

Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
+++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
@@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
         if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
             return -EFAULT;

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        replaced = __swab32(replaced); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
         if (replaced != old)
             return -EINVAL;
     }

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        new = __swab32(new); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
         return -EPERM;

Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
+++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
@@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
     if (tags->hdr.tag != ATAG_CORE)
         convert_to_tag_list(tags);
 #endif
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    byteswap_tags(tags);
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
+
     if (tags->hdr.tag != ATAG_CORE) {
         early_print("Warning: Neither atags nor dtb found\n");
         tags = (struct tag *)&default_tags;
Index: linux-linaro-tracking/arch/arm/kernel/sleep.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/sleep.S
+++ linux-linaro-tracking/arch/arm/kernel/sleep.S
@@ -81,6 +81,13 @@ ENDPROC(cpu_resume_after_mmu)
     .data
     .align
 ENTRY(cpu_resume)
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        /*
+         * Switch to bigendian mode.
+         * Platform resume code most likely already did that, but just in case
+         */
+        setend  be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
 #ifdef CONFIG_SMP
     mov    r1, #0            @ fall-back logical index for UP
     ALT_SMP(mrc p15, 0, r0, c0, c0, 5)
Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
===================================================================
--- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
+++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
@@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a

     __asm__ __volatile__("@ atomic64_add\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %4\n"
 "    adc    %H0, %H0, %H4\n"
+#else
+"    adds    %H0, %H0, %H4\n"
+"    adc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6

     __asm__ __volatile__("@ atomic64_add_return\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %4\n"
 "    adc    %H0, %H0, %H4\n"
+#else
+"    adds    %H0, %H0, %H4\n"
+"    adc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a

     __asm__ __volatile__("@ atomic64_sub\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, %4\n"
 "    sbc    %H0, %H0, %H4\n"
+#else
+"    subs    %H0, %H0, %H4\n"
+"    sbc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6

     __asm__ __volatile__("@ atomic64_sub_return\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, %4\n"
 "    sbc    %H0, %H0, %H4\n"
+#else
+"    subs    %H0, %H0, %H4\n"
+"    sbc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi

     __asm__ __volatile__("@ atomic64_dec_if_positive\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, #1\n"
 "    sbc    %H0, %H0, #0\n"
 "    teq    %H0, #0\n"
+#else
+"    subs    %H0, %H0, #1\n"
+"    sbc    %0, %0, #0\n"
+"    teq    %0, #0\n"
+#endif
 "    bmi    2f\n"
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
@@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
 "    teqeq    %H0, %H5\n"
 "    moveq    %1, #0\n"
 "    beq    2f\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %6\n"
 "    adc    %H0, %H0, %H6\n"
+#else
+"    adds    %H0, %H0, %H6\n"
+"    adc    %0, %0, %6\n"
+#endif
 "    strexd    %2, %0, %H0, [%4]\n"
 "    teq    %2, #0\n"
 "    bne    1b\n"


On 23 July 2013 10:53, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>> On 23/07/13 18:24, Victor Kamensky wrote:
>>>
>>> Hi Ben,
>>>
>>> Wrt BE8 little endian instructions you will need to fix couple more
>>> places:
>>>      ftrace arch/arm/kernel/ftrace.c
>>>      kprobes arch/arm/kernel/kprobes.c
>>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>>> perf
>>> counters will be truncated on max of 32bit values
>>>      atomic64 arch/arm/include/asm/atomic.h
>>
>>
>> Hmm, thought ftrace already did the right endian-ness conversions,
>> although it is possible it could have missed one.
>>
>> The atomic instructions should work fine for BE8, they're using
>> STRD which should be correct.
>
> atomic64_read and atomic64_set are fine, but atomic64_add,
> atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
> The issue is in arithmetic operations not load/store.
>
> Thanks,
> Victor
>
>>
>>> I've attached board independent (core) patch from my work that made
>>> Pandaboard ES
>>> kernel run in BE mode. You can check my version of fixes for above
>>> issues in the
>>> patch. Look for corresponding files changes. Please rework them
>>> according to style
>>> and rules of your patch series. Note the patch itself contains fixes for
>>> few other
>>> issues that already fixed in your series. I'll cross check and compare
>>> individual
>>> areas latter. I think you may find attached patch interesting as well,
>>> it was developed
>>> independent from yours but matches its very well.
>>
>>
>>>
>>
>> --
>> Ben Dooks                               http://www.codethink.co.uk/
>> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 17:53   ` Victor Kamensky
@ 2013-07-23 18:02     ` Ben Dooks
  2013-07-23 18:03     ` Victor Kamensky
  1 sibling, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:53, Victor Kamensky wrote:
> On 23 July 2013 10:40, Ben Dooks<ben.dooks@codethink.co.uk>  wrote:
>> On 23/07/13 18:24, Victor Kamensky wrote:
>>>
>>> Hi Ben,
>>>
>>> Wrt BE8 little endian instructions you will need to fix couple more
>>> places:
>>>       ftrace arch/arm/kernel/ftrace.c
>>>       kprobes arch/arm/kernel/kprobes.c
>>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>>> perf
>>> counters will be truncated on max of 32bit values
>>>       atomic64 arch/arm/include/asm/atomic.h
>>
>>
>> Hmm, thought ftrace already did the right endian-ness conversions,
>> although it is possible it could have missed one.
>>
>> The atomic instructions should work fine for BE8, they're using
>> STRD which should be correct.
>
> atomic64_read and atomic64_set are fine, but atomic64_add,
> atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
> The issue is in arithmetic operations not load/store.

I'm still surprised these don't work. What exact situation are you
seeing the failures in? Code would be very helpful here.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
       [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com>
@ 2013-07-23 18:02 ` Ben Dooks
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:40, Victor Kamensky wrote:
> [resend in plain-text mode]
>
> Hi Ben,
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h
>
> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes
> for few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches it very well.
>
> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
> with SystemTap at some point of patch version on Pandaboard ES (not
> thumb mode though),
> also it may deteriorated while ported across different kernel version,
> good idea to
> rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.

Please wrap your emails to <80 characters, it was really difficult
to read this.

The atomic64 ops is an interesting one, I still think the in-kernel
one is correct. Why are atomic64 being used on 32bit? If you are
trying to decompose a 64bit into two 32bits, then that's probably
the problem.

The relocation code, the R_ARM instruction relocations should only
be instructions and therefore can be safely swapped using the correct
op-code accesses. The R_ARM_ABS32 IIRC is always in the same ordering
as the CPU would access data.

The kprobes stuff I am surprised if it works, due to the fact it calls
patch_text() which already uses <asm/opcodes.h> to do the relevant
byte-swapping. I think you only need to apply swaps to anything
that it loads from memory.

Which version of Linux did you patch?

I think this is the necessary change to kprobes:

diff --git a/arch/arm/kernel/kprobes.c b/arch/arm/kernel/kprobes.c
index 170e9f3..c548356 100644
--- a/arch/arm/kernel/kprobes.c
+++ b/arch/arm/kernel/kprobes.c
@@ -26,6 +26,7 @@
  #include <linux/stop_machine.h>
  #include <linux/stringify.h>
  #include <asm/traps.h>
+#include <asm/opcodes.h>
  #include <asm/cacheflush.h>

  #include "kprobes.h"
@@ -62,10 +63,10 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
  #ifdef CONFIG_THUMB2_KERNEL
         thumb = true;
         addr &= ~1; /* Bit 0 would normally be set to indicate Thumb 
code */
-       insn = ((u16 *)addr)[0];
+       insn = __mem_to_opcode_thumb16(((u16 *)addr)[0]);
         if (is_wide_instruction(insn)) {
                 insn <<= 16;
-               insn |= ((u16 *)addr)[1];
+               insn |= __mem_to_opcode_thumb16(((u16 *)addr)[1]);
                 decode_insn = thumb32_kprobe_decode_insn;
         } else
                 decode_insn = thumb16_kprobe_decode_insn;
@@ -73,7 +74,7 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
         thumb = false;
         if (addr & 0x3)
                 return -EINVAL;
-       insn = *p->addr;
+       insn = __mem_to_opcode_arm(*p->addr);
         decode_insn = arm_kprobe_decode_insn;
  #endif



-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* updates for be8 patch series
  2013-07-23 17:40 ` Ben Dooks
@ 2013-07-23 17:53   ` Victor Kamensky
  2013-07-23 18:02     ` Ben Dooks
  2013-07-23 18:03     ` Victor Kamensky
  0 siblings, 2 replies; 29+ messages in thread
From: Victor Kamensky @ 2013-07-23 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> On 23/07/13 18:24, Victor Kamensky wrote:
>>
>> Hi Ben,
>>
>> Wrt BE8 little endian instructions you will need to fix couple more
>> places:
>>      ftrace arch/arm/kernel/ftrace.c
>>      kprobes arch/arm/kernel/kprobes.c
>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>> perf
>> counters will be truncated on max of 32bit values
>>      atomic64 arch/arm/include/asm/atomic.h
>
>
> Hmm, thought ftrace already did the right endian-ness conversions,
> although it is possible it could have missed one.
>
> The atomic instructions should work fine for BE8, they're using
> STRD which should be correct.

atomic64_read and atomic64_set are fine, but atomic64_add,
atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
The issue is in arithmetic operations not load/store.

Thanks,
Victor

>
>> I've attached board independent (core) patch from my work that made
>> Pandaboard ES
>> kernel run in BE mode. You can check my version of fixes for above
>> issues in the
>> patch. Look for corresponding files changes. Please rework them
>> according to style
>> and rules of your patch series. Note the patch itself contains fixes for
>> few other
>> issues that already fixed in your series. I'll cross check and compare
>> individual
>> areas latter. I think you may find attached patch interesting as well,
>> it was developed
>> independent from yours but matches its very well.
>
>
>>
>
> --
> Ben Dooks                               http://www.codethink.co.uk/
> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

* updates for be8 patch series
       [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com>
@ 2013-07-23 17:40 ` Ben Dooks
  2013-07-23 17:53   ` Victor Kamensky
  0 siblings, 1 reply; 29+ messages in thread
From: Ben Dooks @ 2013-07-23 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:24, Victor Kamensky wrote:
> Hi Ben,
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise
> perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h

Hmm, thought ftrace already did the right endian-ness conversions,
although it is possible it could have missed one.

The atomic instructions should work fine for BE8, they're using
STRD which should be correct.

> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above
> issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes for
> few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches its very well.

>

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-07-26 16:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22 16:33 updates for be8 patch series Ben Dooks
2013-07-22 16:33 ` [PATCH 1/4] ARM: alignment: correctly decode instructions in BE8 mode Ben Dooks
2013-07-24 17:16   ` Dave Martin
2013-07-25 11:48     ` Ben Dooks
2013-07-22 16:33 ` [PATCH 2/4] ARM: traps: use <asm/opcodes.h> to get correct instruction order Ben Dooks
2013-07-24 17:34   ` Dave Martin
2013-07-25 11:17     ` Ben Dooks
2013-07-26 16:04       ` Dave Martin
2013-07-22 16:33 ` [PATCH 3/4] ARM: module: correctly relocate instructions in BE8 Ben Dooks
2013-07-24 17:52   ` Dave Martin
2013-07-22 16:33 ` [PATCH 4/4] ARM: set --be8 when linking modules Ben Dooks
2013-07-22 17:05   ` Stephen Boyd
2013-07-22 17:20     ` Ben Dooks
2013-07-22 18:53       ` Nicolas Pitre
2013-07-22 18:56         ` Ben Dooks
2013-07-23  8:16           ` Ben Dooks
2013-07-22 20:26         ` Ben Dooks
2013-07-22 16:49 ` updates for be8 patch series Ben Dooks
     [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com>
2013-07-23 17:40 ` Ben Dooks
2013-07-23 17:53   ` Victor Kamensky
2013-07-23 18:02     ` Ben Dooks
2013-07-23 18:03     ` Victor Kamensky
2013-07-23 18:30       ` Ben Dooks
2013-07-24  1:06         ` Victor Kamensky
2013-07-24  9:36           ` Will Deacon
2013-07-24 10:46           ` Ben Dooks
2013-07-24  7:31         ` Victor Kamensky
2013-07-24 21:24           ` Victor Kamensky
     [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com>
2013-07-23 18:02 ` Ben Dooks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.