All of lore.kernel.org
 help / color / mirror / Atom feed
* User-space code aborts on some (but not all) misaligned accesses
@ 2017-05-24 15:26 Mason
  2017-05-24 15:45 ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2017-05-24 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

[ Message sent to both gcc-help and LAKML ]

Hello,

Consider the following user-space code, split over two files
to defeat the optimizer.

This test program maps a page of memory not managed by Linux,
and writes 4 words to misaligned addresses within that page.

$ cat store.c 
void store_at_addr_plus_0(void *addr, int val)
{
	__builtin_memcpy(addr + 0, &val, sizeof val);
}
void store_at_addr_plus_1(void *addr, int val)
{
	__builtin_memcpy(addr + 1, &val, sizeof val);
}

$ cat testcase.c 
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
void store_at_addr_plus_0(void *addr, int val);
void store_at_addr_plus_1(void *addr, int val);
int main(void)
{
	int fd = open("/dev/mem", O_RDWR | O_SYNC);
	void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
	store_at_addr_plus_0(ptr + 0, fd); puts("X");	// store at ptr + 0 => OK
	store_at_addr_plus_0(ptr + 1, fd); puts("X");	// store at ptr + 1 => OK
	store_at_addr_plus_1(ptr + 3, fd); puts("X");	// store at ptr + 4 => OK
	store_at_addr_plus_1(ptr + 0, fd); puts("X");	// store at ptr + 1 => ABORT
	return 0;
}

With optimizations turned off, the program works as expected.

$ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
$ ./misaligned_stores 
X
X
X
X

But if optimizations are enabled, the program aborts on the last store.

$ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
# ./misaligned_stores 
X
X
X
Bus error
[ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
[ 8736.464496] Unhandled fault: alignment exception (0x811) at 0xb6f4b001
[ 8736.471106] pgd = de2d4000
[ 8736.473839] [b6f4b001] *pgd=9f56b831, *pte=c0000743, *ppte=c0000c33

(gdb) disassemble store_at_addr_plus_0
   0x000104a6 <+0>:     str     r1, [r0, #0]
   0x000104a8 <+2>:     bx      lr

(gdb) disassemble store_at_addr_plus_1
   0x000104aa <+0>:     str.w   r1, [r0, #1]
   0x000104ae <+4>:     bx      lr


So the 4th store (a misaligned store) aborts.
But why doesn't the 2nd store abort as well?
It targets the *same* address.
They're using different versions of the str instruction.

The compiler generates
str	r1, [r0]	@ unaligned
str	r1, [r0, #1]	@ unaligned

According to objdump

00000000 <store_at_addr_plus_0>:
   0:	6001      	str	r1, [r0, #0]
   2:	4770      	bx	lr

00000004 <store_at_addr_plus_1>:
   4:	f8c0 1001 	str.w	r1, [r0, #1]
   8:	4770      	bx	lr

Side issue, the T2 encoding for the STR instruction states
1 1 1 1 1 0 0 0 0 1 0 0 Rn
which comes out as f840, not f8c0; I don't understand.

My question is:

Why does instruction "6001" work on misaligned addresses,
while "f8c0 1001" aborts?

Below the disas of main FWIW.

Regards.



(gdb) disassemble main
   0x00010430 <+0>:     push    {r4, r5, r6, lr}
   0x00010432 <+2>:     sub     sp, #8
   0x00010434 <+4>:     movw    r1, #4098       ; 0x1002
   0x00010438 <+8>:     movt    r1, #16
   0x0001043c <+12>:    movw    r0, #4620       ; 0x120c
   0x00010440 <+16>:    movt    r0, #1
   0x00010444 <+20>:    blx     0x1032c <open@plt>
   0x00010448 <+24>:    mov     r5, r0
   0x0001044a <+26>:    mov.w   r3, #3221225472 ; 0xc0000000
   0x0001044e <+30>:    str     r3, [sp, #4]
   0x00010450 <+32>:    str     r0, [sp, #0]
   0x00010452 <+34>:    movs    r3, #1
   0x00010454 <+36>:    movs    r2, #3
   0x00010456 <+38>:    mov.w   r1, #4096       ; 0x1000
   0x0001045a <+42>:    movs    r0, #0
   0x0001045c <+44>:    blx     0x10338 <mmap@plt>
   0x00010460 <+48>:    mov     r6, r0
   0x00010462 <+50>:    mov     r1, r5
   0x00010464 <+52>:    bl      0x104a6 <store_at_addr_plus_0>
   0x00010468 <+56>:    movw    r4, #4632       ; 0x1218
   0x0001046c <+60>:    movt    r4, #1
   0x00010470 <+64>:    mov     r0, r4
   0x00010472 <+66>:    blx     0x10308 <puts@plt>
   0x00010476 <+70>:    mov     r1, r5
   0x00010478 <+72>:    adds    r0, r6, #1
   0x0001047a <+74>:    bl      0x104a6 <store_at_addr_plus_0>
   0x0001047e <+78>:    mov     r0, r4
   0x00010480 <+80>:    blx     0x10308 <puts@plt>
   0x00010484 <+84>:    mov     r1, r5
   0x00010486 <+86>:    adds    r0, r6, #3
   0x00010488 <+88>:    bl      0x104aa <store_at_addr_plus_1>
   0x0001048c <+92>:    mov     r0, r4
   0x0001048e <+94>:    blx     0x10308 <puts@plt>
   0x00010492 <+98>:    mov     r1, r5
   0x00010494 <+100>:   mov     r0, r6
   0x00010496 <+102>:   bl      0x104aa <store_at_addr_plus_1>
   0x0001049a <+106>:   mov     r0, r4
   0x0001049c <+108>:   blx     0x10308 <puts@plt>
   0x000104a0 <+112>:   movs    r0, #0
   0x000104a2 <+114>:   add     sp, #8
   0x000104a4 <+116>:   pop     {r4, r5, r6, pc}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 15:26 User-space code aborts on some (but not all) misaligned accesses Mason
@ 2017-05-24 15:45 ` Robin Murphy
  2017-05-24 16:56   ` Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2017-05-24 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/05/17 16:26, Mason wrote:
> [ Message sent to both gcc-help and LAKML ]
> 
> Hello,
> 
> Consider the following user-space code, split over two files
> to defeat the optimizer.
> 
> This test program maps a page of memory not managed by Linux,
> and writes 4 words to misaligned addresses within that page.
> 
> $ cat store.c 
> void store_at_addr_plus_0(void *addr, int val)
> {
> 	__builtin_memcpy(addr + 0, &val, sizeof val);
> }
> void store_at_addr_plus_1(void *addr, int val)
> {
> 	__builtin_memcpy(addr + 1, &val, sizeof val);
> }
> 
> $ cat testcase.c 
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <stdio.h>
> void store_at_addr_plus_0(void *addr, int val);
> void store_at_addr_plus_1(void *addr, int val);
> int main(void)
> {
> 	int fd = open("/dev/mem", O_RDWR | O_SYNC);
> 	void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
> 	store_at_addr_plus_0(ptr + 0, fd); puts("X");	// store at ptr + 0 => OK
> 	store_at_addr_plus_0(ptr + 1, fd); puts("X");	// store at ptr + 1 => OK
> 	store_at_addr_plus_1(ptr + 3, fd); puts("X");	// store at ptr + 4 => OK
> 	store_at_addr_plus_1(ptr + 0, fd); puts("X");	// store at ptr + 1 => ABORT
> 	return 0;
> }
> 
> With optimizations turned off, the program works as expected.
> 
> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
> $ ./misaligned_stores 
> X
> X
> X
> X
> 
> But if optimizations are enabled, the program aborts on the last store.
> 
> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
> # ./misaligned_stores 
> X
> X
> X
> Bus error
> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]

^^^

Note where that message comes from: The alignment fault fixup code
doesn't recognise this instruction encoding, so it doesn't get fixed up.
It's that simple.

Try "echo 5 > /proc/cpu/alignment" then run it again, and it should
become clearer what the kernel's doing (or not) behind your back - see
Documentation/arm/mem_alignment

The other thing to say, of course, is "don't make unaligned accesses to
Strongly-Ordered memory in the first place".

Robin.

> [ 8736.464496] Unhandled fault: alignment exception (0x811) at 0xb6f4b001
> [ 8736.471106] pgd = de2d4000
> [ 8736.473839] [b6f4b001] *pgd=9f56b831, *pte=c0000743, *ppte=c0000c33
> 
> (gdb) disassemble store_at_addr_plus_0
>    0x000104a6 <+0>:     str     r1, [r0, #0]
>    0x000104a8 <+2>:     bx      lr
> 
> (gdb) disassemble store_at_addr_plus_1
>    0x000104aa <+0>:     str.w   r1, [r0, #1]
>    0x000104ae <+4>:     bx      lr
> 
> 
> So the 4th store (a misaligned store) aborts.
> But why doesn't the 2nd store abort as well?
> It targets the *same* address.
> They're using different versions of the str instruction.
> 
> The compiler generates
> str	r1, [r0]	@ unaligned
> str	r1, [r0, #1]	@ unaligned
> 
> According to objdump
> 
> 00000000 <store_at_addr_plus_0>:
>    0:	6001      	str	r1, [r0, #0]
>    2:	4770      	bx	lr
> 
> 00000004 <store_at_addr_plus_1>:
>    4:	f8c0 1001 	str.w	r1, [r0, #1]
>    8:	4770      	bx	lr
> 
> Side issue, the T2 encoding for the STR instruction states
> 1 1 1 1 1 0 0 0 0 1 0 0 Rn
> which comes out as f840, not f8c0; I don't understand.
> 
> My question is:
> 
> Why does instruction "6001" work on misaligned addresses,
> while "f8c0 1001" aborts?
> 
> Below the disas of main FWIW.
> 
> Regards.
> 
> 
> 
> (gdb) disassemble main
>    0x00010430 <+0>:     push    {r4, r5, r6, lr}
>    0x00010432 <+2>:     sub     sp, #8
>    0x00010434 <+4>:     movw    r1, #4098       ; 0x1002
>    0x00010438 <+8>:     movt    r1, #16
>    0x0001043c <+12>:    movw    r0, #4620       ; 0x120c
>    0x00010440 <+16>:    movt    r0, #1
>    0x00010444 <+20>:    blx     0x1032c <open@plt>
>    0x00010448 <+24>:    mov     r5, r0
>    0x0001044a <+26>:    mov.w   r3, #3221225472 ; 0xc0000000
>    0x0001044e <+30>:    str     r3, [sp, #4]
>    0x00010450 <+32>:    str     r0, [sp, #0]
>    0x00010452 <+34>:    movs    r3, #1
>    0x00010454 <+36>:    movs    r2, #3
>    0x00010456 <+38>:    mov.w   r1, #4096       ; 0x1000
>    0x0001045a <+42>:    movs    r0, #0
>    0x0001045c <+44>:    blx     0x10338 <mmap@plt>
>    0x00010460 <+48>:    mov     r6, r0
>    0x00010462 <+50>:    mov     r1, r5
>    0x00010464 <+52>:    bl      0x104a6 <store_at_addr_plus_0>
>    0x00010468 <+56>:    movw    r4, #4632       ; 0x1218
>    0x0001046c <+60>:    movt    r4, #1
>    0x00010470 <+64>:    mov     r0, r4
>    0x00010472 <+66>:    blx     0x10308 <puts@plt>
>    0x00010476 <+70>:    mov     r1, r5
>    0x00010478 <+72>:    adds    r0, r6, #1
>    0x0001047a <+74>:    bl      0x104a6 <store_at_addr_plus_0>
>    0x0001047e <+78>:    mov     r0, r4
>    0x00010480 <+80>:    blx     0x10308 <puts@plt>
>    0x00010484 <+84>:    mov     r1, r5
>    0x00010486 <+86>:    adds    r0, r6, #3
>    0x00010488 <+88>:    bl      0x104aa <store_at_addr_plus_1>
>    0x0001048c <+92>:    mov     r0, r4
>    0x0001048e <+94>:    blx     0x10308 <puts@plt>
>    0x00010492 <+98>:    mov     r1, r5
>    0x00010494 <+100>:   mov     r0, r6
>    0x00010496 <+102>:   bl      0x104aa <store_at_addr_plus_1>
>    0x0001049a <+106>:   mov     r0, r4
>    0x0001049c <+108>:   blx     0x10308 <puts@plt>
>    0x000104a0 <+112>:   movs    r0, #0
>    0x000104a2 <+114>:   add     sp, #8
>    0x000104a4 <+116>:   pop     {r4, r5, r6, pc}
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 15:45 ` Robin Murphy
@ 2017-05-24 16:56   ` Mason
  2017-05-24 17:25     ` Robin Murphy
  2017-05-24 17:27     ` Ard Biesheuvel
  0 siblings, 2 replies; 9+ messages in thread
From: Mason @ 2017-05-24 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/05/2017 17:45, Robin Murphy wrote:

> On 24/05/17 16:26, Mason wrote:
>
>> Consider the following user-space code, split over two files
>> to defeat the optimizer.
>>
>> This test program maps a page of memory not managed by Linux,
>> and writes 4 words to misaligned addresses within that page.
>>
>> $ cat store.c 
>> void store_at_addr_plus_0(void *addr, int val)
>> {
>> 	__builtin_memcpy(addr + 0, &val, sizeof val);
>> }
>> void store_at_addr_plus_1(void *addr, int val)
>> {
>> 	__builtin_memcpy(addr + 1, &val, sizeof val);
>> }
>>
>> $ cat testcase.c 
>> #include <fcntl.h>
>> #include <sys/mman.h>
>> #include <stdio.h>
>> void store_at_addr_plus_0(void *addr, int val);
>> void store_at_addr_plus_1(void *addr, int val);
>> int main(void)
>> {
>> 	int fd = open("/dev/mem", O_RDWR | O_SYNC);
>> 	void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
>> 	store_at_addr_plus_0(ptr + 0, fd); puts("X");	// store at ptr + 0 => OK
>> 	store_at_addr_plus_0(ptr + 1, fd); puts("X");	// store at ptr + 1 => OK
>> 	store_at_addr_plus_1(ptr + 3, fd); puts("X");	// store at ptr + 4 => OK
>> 	store_at_addr_plus_1(ptr + 0, fd); puts("X");	// store at ptr + 1 => ABORT
>> 	return 0;
>> }
>>
>> With optimizations turned off, the program works as expected.
>>
>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
>> $ ./misaligned_stores 
>> X
>> X
>> X
>> X
>>
>> But if optimizations are enabled, the program aborts on the last store.
>>
>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
>> # ./misaligned_stores 
>> X
>> X
>> X
>> Bus error
>> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
> ^^^
> 
> Note where that message comes from: The alignment fault fixup code
> doesn't recognise this instruction encoding, so it doesn't get fixed up.
> It's that simple.

ARMv7 can handle misaligned accesses in hardware, right?
But Linux sets up the MMU mapping to fault for misaligned
accesses in "non-standard" areas, is that correct?

I will study arch/arm/mm/alignment.c

> Try "echo 5 > /proc/cpu/alignment" then run it again, and it should
> become clearer what the kernel's doing (or not) behind your back - see
> Documentation/arm/mem_alignment

# echo 5 > /proc/cpu/alignment
# ./misaligned_stores 
X
Bus error
[  241.813350] Alignment trap: misaligned_stor (1015) PC=0x000104b8 Instr=0x6001 Address=0xb6f16001 FSR 0x811

> The other thing to say, of course, is "don't make unaligned accesses to
> Strongly-Ordered memory in the first place".

How would you fix my test case?

Ard mentioned something similar on IRC:
> doesn't the issue go away when you stop using device attributes for the userland mapping?
> iiuc you are mapping memory from userland that is not mapped by the kernel, right?
> which is why it gets pgprot_noncached() attributes
> so if you do add this memory to memblock but with the MEMBLOCK_NOMAP attribute
> and use O_SYNC to open /dev/mem from userland
> you will get writecombine attributes instead
> it is perfectly legal for gcc to generate unaligned accesses to something that is presented
> to it as being memory so you should focus on getting the attributes correct on this region


I will study the different properties (cached vs noncached, write-combined).



>> [ 8736.464496] Unhandled fault: alignment exception (0x811) at 0xb6f4b001
>> [ 8736.471106] pgd = de2d4000
>> [ 8736.473839] [b6f4b001] *pgd=9f56b831, *pte=c0000743, *ppte=c0000c33
>>
>> (gdb) disassemble store_at_addr_plus_0
>>    0x000104a6 <+0>:     str     r1, [r0, #0]
>>    0x000104a8 <+2>:     bx      lr
>>
>> (gdb) disassemble store_at_addr_plus_1
>>    0x000104aa <+0>:     str.w   r1, [r0, #1]
>>    0x000104ae <+4>:     bx      lr
>>
>>
>> So the 4th store (a misaligned store) aborts.
>> But why doesn't the 2nd store abort as well?
>> It targets the *same* address.
>> They're using different versions of the str instruction.
>>
>> The compiler generates
>> str	r1, [r0]	@ unaligned
>> str	r1, [r0, #1]	@ unaligned
>>
>> According to objdump
>>
>> 00000000 <store_at_addr_plus_0>:
>>    0:	6001      	str	r1, [r0, #0]
>>    2:	4770      	bx	lr
>>
>> 00000004 <store_at_addr_plus_1>:
>>    4:	f8c0 1001 	str.w	r1, [r0, #1]
>>    8:	4770      	bx	lr
>>
>> Side issue, the T2 encoding for the STR instruction states
>> 1 1 1 1 1 0 0 0 0 1 0 0 Rn
>> which comes out as f840, not f8c0; I don't understand.

Ard said:
> btw the str.w encodings are listed as T3/T4 in my copy of the v8 ARM ARM

I'm on a Cortex A9, so ARMv7-A
But my copy of the ARM ARM is revB.
I found rev C.b but that doesn't explain f8c0 vs f840

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 16:56   ` Mason
@ 2017-05-24 17:25     ` Robin Murphy
  2017-05-24 21:19       ` Mason
  2017-05-24 17:27     ` Ard Biesheuvel
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2017-05-24 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/05/17 17:56, Mason wrote:
> On 24/05/2017 17:45, Robin Murphy wrote:
> 
>> On 24/05/17 16:26, Mason wrote:
>>
>>> Consider the following user-space code, split over two files
>>> to defeat the optimizer.
>>>
>>> This test program maps a page of memory not managed by Linux,
>>> and writes 4 words to misaligned addresses within that page.
>>>
>>> $ cat store.c 
>>> void store_at_addr_plus_0(void *addr, int val)
>>> {
>>> 	__builtin_memcpy(addr + 0, &val, sizeof val);
>>> }
>>> void store_at_addr_plus_1(void *addr, int val)
>>> {
>>> 	__builtin_memcpy(addr + 1, &val, sizeof val);
>>> }
>>>
>>> $ cat testcase.c 
>>> #include <fcntl.h>
>>> #include <sys/mman.h>
>>> #include <stdio.h>
>>> void store_at_addr_plus_0(void *addr, int val);
>>> void store_at_addr_plus_1(void *addr, int val);
>>> int main(void)
>>> {
>>> 	int fd = open("/dev/mem", O_RDWR | O_SYNC);
>>> 	void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
>>> 	store_at_addr_plus_0(ptr + 0, fd); puts("X");	// store at ptr + 0 => OK
>>> 	store_at_addr_plus_0(ptr + 1, fd); puts("X");	// store at ptr + 1 => OK
>>> 	store_at_addr_plus_1(ptr + 3, fd); puts("X");	// store at ptr + 4 => OK
>>> 	store_at_addr_plus_1(ptr + 0, fd); puts("X");	// store at ptr + 1 => ABORT
>>> 	return 0;
>>> }
>>>
>>> With optimizations turned off, the program works as expected.
>>>
>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
>>> $ ./misaligned_stores 
>>> X
>>> X
>>> X
>>> X
>>>
>>> But if optimizations are enabled, the program aborts on the last store.
>>>
>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
>>> # ./misaligned_stores 
>>> X
>>> X
>>> X
>>> Bus error
>>> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
>> ^^^
>>
>> Note where that message comes from: The alignment fault fixup code
>> doesn't recognise this instruction encoding, so it doesn't get fixed up.
>> It's that simple.
> 
> ARMv7 can handle misaligned accesses in hardware, right?
> But Linux sets up the MMU mapping to fault for misaligned
> accesses in "non-standard" areas, is that correct?

Unaligned accesses are only supported to Normal memory - anything mapped
as Device or Strongly Ordered will always make one fault at the MMU
before it even gets a chance to go out onto the interconnect and wreak
havoc.

> I will study arch/arm/mm/alignment.c
> 
>> Try "echo 5 > /proc/cpu/alignment" then run it again, and it should
>> become clearer what the kernel's doing (or not) behind your back - see
>> Documentation/arm/mem_alignment
> 
> # echo 5 > /proc/cpu/alignment
> # ./misaligned_stores 
> X
> Bus error
> [  241.813350] Alignment trap: misaligned_stor (1015) PC=0x000104b8 Instr=0x6001 Address=0xb6f16001 FSR 0x811
> 
>> The other thing to say, of course, is "don't make unaligned accesses to
>> Strongly-Ordered memory in the first place".
> 
> How would you fix my test case?

"rm store.c testcase.c"?

The point being that what you are doing looks fairly nonsensical to
begin with, since it's not like many peripherals support unaligned reads
or writes anyway. /dev/mem gives you pgprot_noncached, which translates
to Strongly Ordered, because as far as the kernel's concerned you're
mapping random bits of physical address space which could be home to
anything at all, and using a weaker memory type could be a Very Bad
Thing. You don't want to waste (significant) time debugging the
side-effects of the CPU speculatively filling cachelines from some
read-sensitive register, that's for sure.

> Ard mentioned something similar on IRC:
>> doesn't the issue go away when you stop using device attributes for the userland mapping?
>> iiuc you are mapping memory from userland that is not mapped by the kernel, right?
>> which is why it gets pgprot_noncached() attributes
>> so if you do add this memory to memblock but with the MEMBLOCK_NOMAP attribute
>> and use O_SYNC to open /dev/mem from userland
>> you will get writecombine attributes instead
>> it is perfectly legal for gcc to generate unaligned accesses to something that is presented
>> to it as being memory so you should focus on getting the attributes correct on this region
> 
> 
> I will study the different properties (cached vs noncached, write-combined).
> 
> 
> 
>>> [ 8736.464496] Unhandled fault: alignment exception (0x811) at 0xb6f4b001
>>> [ 8736.471106] pgd = de2d4000
>>> [ 8736.473839] [b6f4b001] *pgd=9f56b831, *pte=c0000743, *ppte=c0000c33
>>>
>>> (gdb) disassemble store_at_addr_plus_0
>>>    0x000104a6 <+0>:     str     r1, [r0, #0]
>>>    0x000104a8 <+2>:     bx      lr
>>>
>>> (gdb) disassemble store_at_addr_plus_1
>>>    0x000104aa <+0>:     str.w   r1, [r0, #1]
>>>    0x000104ae <+4>:     bx      lr
>>>
>>>
>>> So the 4th store (a misaligned store) aborts.
>>> But why doesn't the 2nd store abort as well?
>>> It targets the *same* address.
>>> They're using different versions of the str instruction.
>>>
>>> The compiler generates
>>> str	r1, [r0]	@ unaligned
>>> str	r1, [r0, #1]	@ unaligned
>>>
>>> According to objdump
>>>
>>> 00000000 <store_at_addr_plus_0>:
>>>    0:	6001      	str	r1, [r0, #0]
>>>    2:	4770      	bx	lr
>>>
>>> 00000004 <store_at_addr_plus_1>:
>>>    4:	f8c0 1001 	str.w	r1, [r0, #1]
>>>    8:	4770      	bx	lr
>>>
>>> Side issue, the T2 encoding for the STR instruction states
>>> 1 1 1 1 1 0 0 0 0 1 0 0 Rn
>>> which comes out as f840, not f8c0; I don't understand.
> 
> Ard said:
>> btw the str.w encodings are listed as T3/T4 in my copy of the v8 ARM ARM
> 
> I'm on a Cortex A9, so ARMv7-A
> But my copy of the ARM ARM is revB.
> I found rev C.b but that doesn't explain f8c0 vs f840

Its an immediate-offset STR, not a register-offset one.

Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 16:56   ` Mason
  2017-05-24 17:25     ` Robin Murphy
@ 2017-05-24 17:27     ` Ard Biesheuvel
  2017-05-24 17:36       ` Robin Murphy
  2017-05-24 22:15       ` Mason
  1 sibling, 2 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2017-05-24 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 May 2017 at 09:56, Mason <slash.tmp@free.fr> wrote:
> On 24/05/2017 17:45, Robin Murphy wrote:
>
>> On 24/05/17 16:26, Mason wrote:
>>
>>> Consider the following user-space code, split over two files
>>> to defeat the optimizer.
>>>
>>> This test program maps a page of memory not managed by Linux,
>>> and writes 4 words to misaligned addresses within that page.
>>>
>>> $ cat store.c
>>> void store_at_addr_plus_0(void *addr, int val)
>>> {
>>>      __builtin_memcpy(addr + 0, &val, sizeof val);
>>> }
>>> void store_at_addr_plus_1(void *addr, int val)
>>> {
>>>      __builtin_memcpy(addr + 1, &val, sizeof val);
>>> }
>>>
>>> $ cat testcase.c
>>> #include <fcntl.h>
>>> #include <sys/mman.h>
>>> #include <stdio.h>
>>> void store_at_addr_plus_0(void *addr, int val);
>>> void store_at_addr_plus_1(void *addr, int val);
>>> int main(void)
>>> {
>>>      int fd = open("/dev/mem", O_RDWR | O_SYNC);
>>>      void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
>>>      store_at_addr_plus_0(ptr + 0, fd); puts("X");   // store at ptr + 0 => OK
>>>      store_at_addr_plus_0(ptr + 1, fd); puts("X");   // store at ptr + 1 => OK
>>>      store_at_addr_plus_1(ptr + 3, fd); puts("X");   // store at ptr + 4 => OK
>>>      store_at_addr_plus_1(ptr + 0, fd); puts("X");   // store at ptr + 1 => ABORT
>>>      return 0;
>>> }
>>>
>>> With optimizations turned off, the program works as expected.
>>>
>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
>>> $ ./misaligned_stores
>>> X
>>> X
>>> X
>>> X
>>>
>>> But if optimizations are enabled, the program aborts on the last store.
>>>
>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
>>> # ./misaligned_stores
>>> X
>>> X
>>> X
>>> Bus error
>>> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
>> ^^^
>>
>> Note where that message comes from: The alignment fault fixup code
>> doesn't recognise this instruction encoding, so it doesn't get fixed up.
>> It's that simple.

Well spotted. I missed that bit, but it makes perfect sense. Mason,
care to propose a patch to the alignment fixup code that adds the
missing encoding?

>
> ARMv7 can handle misaligned accesses in hardware, right?
> But Linux sets up the MMU mapping to fault for misaligned
> accesses in "non-standard" areas, is that correct?
>

Please understand that device attributes simply imply that unaligned
accesses are not supportable. There is no policy here that you can
debate. If the underlying bus does not implement unaligned accesses,
the CPU needs to split them into several smaller ones, which is
impossible to do when side effects are taken into account (unless you
know the exact nature of the side effects of the particular location)

> I will study arch/arm/mm/alignment.c
>
>> Try "echo 5 > /proc/cpu/alignment" then run it again, and it should
>> become clearer what the kernel's doing (or not) behind your back - see
>> Documentation/arm/mem_alignment
>
> # echo 5 > /proc/cpu/alignment
> # ./misaligned_stores
> X
> Bus error
> [  241.813350] Alignment trap: misaligned_stor (1015) PC=0x000104b8 Instr=0x6001 Address=0xb6f16001 FSR 0x811
>
>> The other thing to say, of course, is "don't make unaligned accesses to
>> Strongly-Ordered memory in the first place".
>
> How would you fix my test case?
>
> Ard mentioned something similar on IRC:
>> doesn't the issue go away when you stop using device attributes for the userland mapping?
>> iiuc you are mapping memory from userland that is not mapped by the kernel, right?
>> which is why it gets pgprot_noncached() attributes
>> so if you do add this memory to memblock but with the MEMBLOCK_NOMAP attribute
>> and use O_SYNC to open /dev/mem from userland
>> you will get writecombine attributes instead
>> it is perfectly legal for gcc to generate unaligned accesses to something that is presented
>> to it as being memory so you should focus on getting the attributes correct on this region
>
>
> I will study the different properties (cached vs noncached, write-combined).
>

It is really quite simple
1. add the memory to the /memory DT node
2. add it as a no-map region to the /reserved-memory DT node

This should result in pgprot_writecombine() attributes on your O_SYNC
/dev/mem mapping, which should make the problem go away.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 17:27     ` Ard Biesheuvel
@ 2017-05-24 17:36       ` Robin Murphy
  2017-05-24 17:40         ` Ard Biesheuvel
  2017-05-24 22:15       ` Mason
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2017-05-24 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/05/17 18:27, Ard Biesheuvel wrote:
> On 24 May 2017 at 09:56, Mason <slash.tmp@free.fr> wrote:
>> On 24/05/2017 17:45, Robin Murphy wrote:
>>
>>> On 24/05/17 16:26, Mason wrote:
>>>
>>>> Consider the following user-space code, split over two files
>>>> to defeat the optimizer.
>>>>
>>>> This test program maps a page of memory not managed by Linux,
>>>> and writes 4 words to misaligned addresses within that page.
>>>>
>>>> $ cat store.c
>>>> void store_at_addr_plus_0(void *addr, int val)
>>>> {
>>>>      __builtin_memcpy(addr + 0, &val, sizeof val);
>>>> }
>>>> void store_at_addr_plus_1(void *addr, int val)
>>>> {
>>>>      __builtin_memcpy(addr + 1, &val, sizeof val);
>>>> }
>>>>
>>>> $ cat testcase.c
>>>> #include <fcntl.h>
>>>> #include <sys/mman.h>
>>>> #include <stdio.h>
>>>> void store_at_addr_plus_0(void *addr, int val);
>>>> void store_at_addr_plus_1(void *addr, int val);
>>>> int main(void)
>>>> {
>>>>      int fd = open("/dev/mem", O_RDWR | O_SYNC);
>>>>      void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
>>>>      store_at_addr_plus_0(ptr + 0, fd); puts("X");   // store at ptr + 0 => OK
>>>>      store_at_addr_plus_0(ptr + 1, fd); puts("X");   // store at ptr + 1 => OK
>>>>      store_at_addr_plus_1(ptr + 3, fd); puts("X");   // store at ptr + 4 => OK
>>>>      store_at_addr_plus_1(ptr + 0, fd); puts("X");   // store at ptr + 1 => ABORT
>>>>      return 0;
>>>> }
>>>>
>>>> With optimizations turned off, the program works as expected.
>>>>
>>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
>>>> $ ./misaligned_stores
>>>> X
>>>> X
>>>> X
>>>> X
>>>>
>>>> But if optimizations are enabled, the program aborts on the last store.
>>>>
>>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
>>>> # ./misaligned_stores
>>>> X
>>>> X
>>>> X
>>>> Bus error
>>>> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
>>> ^^^
>>>
>>> Note where that message comes from: The alignment fault fixup code
>>> doesn't recognise this instruction encoding, so it doesn't get fixed up.
>>> It's that simple.
> 
> Well spotted. I missed that bit, but it makes perfect sense. Mason,
> care to propose a patch to the alignment fixup code that adds the
> missing encoding?

No need for that - anything that could be executing 32-bit Thumb
encodings also supports (and will be using) the v6 unaligned access
model by definition. I would assume that the "regular" loads/stores are
deliberately unhandled for that reason (i.e. it would never be correct
to fix up).

Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 17:36       ` Robin Murphy
@ 2017-05-24 17:40         ` Ard Biesheuvel
  0 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2017-05-24 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 May 2017 at 10:36, Robin Murphy <robin.murphy@arm.com> wrote:
> On 24/05/17 18:27, Ard Biesheuvel wrote:
>> On 24 May 2017 at 09:56, Mason <slash.tmp@free.fr> wrote:
>>> On 24/05/2017 17:45, Robin Murphy wrote:
>>>
>>>> On 24/05/17 16:26, Mason wrote:
>>>>
>>>>> Consider the following user-space code, split over two files
>>>>> to defeat the optimizer.
>>>>>
>>>>> This test program maps a page of memory not managed by Linux,
>>>>> and writes 4 words to misaligned addresses within that page.
>>>>>
>>>>> $ cat store.c
>>>>> void store_at_addr_plus_0(void *addr, int val)
>>>>> {
>>>>>      __builtin_memcpy(addr + 0, &val, sizeof val);
>>>>> }
>>>>> void store_at_addr_plus_1(void *addr, int val)
>>>>> {
>>>>>      __builtin_memcpy(addr + 1, &val, sizeof val);
>>>>> }
>>>>>
>>>>> $ cat testcase.c
>>>>> #include <fcntl.h>
>>>>> #include <sys/mman.h>
>>>>> #include <stdio.h>
>>>>> void store_at_addr_plus_0(void *addr, int val);
>>>>> void store_at_addr_plus_1(void *addr, int val);
>>>>> int main(void)
>>>>> {
>>>>>      int fd = open("/dev/mem", O_RDWR | O_SYNC);
>>>>>      void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000);
>>>>>      store_at_addr_plus_0(ptr + 0, fd); puts("X");   // store at ptr + 0 => OK
>>>>>      store_at_addr_plus_0(ptr + 1, fd); puts("X");   // store at ptr + 1 => OK
>>>>>      store_at_addr_plus_1(ptr + 3, fd); puts("X");   // store at ptr + 4 => OK
>>>>>      store_at_addr_plus_1(ptr + 0, fd); puts("X");   // store at ptr + 1 => ABORT
>>>>>      return 0;
>>>>> }
>>>>>
>>>>> With optimizations turned off, the program works as expected.
>>>>>
>>>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores
>>>>> $ ./misaligned_stores
>>>>> X
>>>>> X
>>>>> X
>>>>> X
>>>>>
>>>>> But if optimizations are enabled, the program aborts on the last store.
>>>>>
>>>>> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores
>>>>> # ./misaligned_stores
>>>>> X
>>>>> X
>>>>> X
>>>>> Bus error
>>>>> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>]
>>>> ^^^
>>>>
>>>> Note where that message comes from: The alignment fault fixup code
>>>> doesn't recognise this instruction encoding, so it doesn't get fixed up.
>>>> It's that simple.
>>
>> Well spotted. I missed that bit, but it makes perfect sense. Mason,
>> care to propose a patch to the alignment fixup code that adds the
>> missing encoding?
>
> No need for that - anything that could be executing 32-bit Thumb
> encodings also supports (and will be using) the v6 unaligned access
> model by definition. I would assume that the "regular" loads/stores are
> deliberately unhandled for that reason (i.e. it would never be correct
> to fix up).
>

Fair enough. It causes some inconsistencies, as the example shows, but
the alignment fault handling is slightly inconsistent anyway, so I
suppose that doesn't really matter (given that the code is not
intended to deal with accesses to mappings with device attributes)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 17:25     ` Robin Murphy
@ 2017-05-24 21:19       ` Mason
  0 siblings, 0 replies; 9+ messages in thread
From: Mason @ 2017-05-24 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/05/2017 19:25, Robin Murphy wrote:

> On 24/05/17 17:56, Mason wrote:
>
>> On 24/05/2017 17:45, Robin Murphy wrote:
>>
>>> The other thing to say, of course, is "don't make unaligned accesses to
>>> Strongly-Ordered memory in the first place".
>>
>> How would you fix my test case?
> 
> "rm store.c testcase.c"?
> 
> The point being that what you are doing looks fairly nonsensical to
> begin with, since it's not like many peripherals support unaligned reads
> or writes anyway. /dev/mem gives you pgprot_noncached, which translates
> to Strongly Ordered, because as far as the kernel's concerned you're
> mapping random bits of physical address space which could be home to
> anything at all, and using a weaker memory type could be a Very Bad
> Thing. You don't want to waste (significant) time debugging the
> side-effects of the CPU speculatively filling cachelines from some
> read-sensitive register, that's for sure.

For the record, the code base in question is very old, some
of it predates the 2.6 kernel. I do get the urge to send it
to /dev/null on a regular basis (I've rewritten a few drivers
from scratch, which are now upstream.)

Basically, the system provides 2 GB of RAM.

	memory at 80000000 {
		device_type = "memory";
		reg = <0x80000000 0x80000000>; /* 2 GB */
	};

Currently, Linux is given only a fraction of the available RAM
via mem=256M on the boot command-line. The rest is managed by
a cross-processor memory manager. This "foreign" RAM is then
mapped in Linux, and written from user-space to pass information
to other processors in the system (MIPS, DSP).

IIUC, lying to Linux about the nature of this address range has
always been a bad idea. Linux must be aware that it is plain RAM,
without side-effects and similar nastiness.


>> I'm on a Cortex A9, so ARMv7-A
>> But my copy of the ARM ARM is revB.
>> I found rev C.b but that doesn't explain f8c0 vs f840
> 
> Its an immediate-offset STR, not a register-offset one.

I see it now.

A8.8.203 STR (immediate, Thumb)

Encoding T3 ARMv6T2, ARMv7
STR<c>.W <Rt>, [<Rn>, #<imm12>]

1 1 1 1 1 0 0 0 1 1 0 0 Rn Rt imm12

str.w	r1, [r0, #1]
n=0 t=1 imm=1

f8c0 1001

Cool, thanks.


On a tangential topic, between

A8.8.203 STR (immediate, Thumb) Encoding T4 ARMv6T2, ARMv7
1 1 1 1 1 0 0 0 0 1 0 0 Rn Rt 1 P U W imm8  (f84n txxx)

and

A8.8.205 STR (register) Encoding T2 ARMv6T2, ARMv7
1 1 1 1 1 0 0 0 0 1 0 0 Rn Rt 0 0 0 0 0 0 imm2 Rm (f84n t0xm)


The instruction f840 1001 cannot be A8.8.203 Encoding T4
because P and W cannot both be zero, therefore it must
be A8.8.205 Encoding T2?

So the decoder cannot determine the exact instruction just
by looking at a prefix? It has to scan the entire string?

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* User-space code aborts on some (but not all) misaligned accesses
  2017-05-24 17:27     ` Ard Biesheuvel
  2017-05-24 17:36       ` Robin Murphy
@ 2017-05-24 22:15       ` Mason
  1 sibling, 0 replies; 9+ messages in thread
From: Mason @ 2017-05-24 22:15 UTC (permalink / raw)
  To: linux-arm-kernel

[ Dropping gcc-help at this point ]

On 24/05/2017 19:27, Ard Biesheuvel wrote:

> It is really quite simple
> 1. add the memory to the /memory DT node
> 2. add it as a no-map region to the /reserved-memory DT node
> 
> This should result in pgprot_writecombine() attributes on your O_SYNC
> /dev/mem mapping, which should make the problem go away.

I think I see what you are referring to:

http://elixir.free-electrons.com/linux/latest/source/drivers/char/mem.c#L357

http://elixir.free-electrons.com/linux/latest/source/arch/arm/mm/mmu.c#L701

#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
			      unsigned long size, pgprot_t vma_prot)
{
	if (!pfn_valid(pfn))
		return pgprot_noncached(vma_prot);
	else if (file->f_flags & O_SYNC)
		return pgprot_writecombine(vma_prot);
	return vma_prot;
}
EXPORT_SYMBOL(phys_mem_access_prot);
#endif

Telling Linux about the RAM makes pfn_valid() return true,
and using O_SYNC means calling pgprot_writecombine.

Thanks for the pointers.

For my own reference:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt
https://www.kernel.org/doc/Documentation/arm/mem_alignment

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-24 22:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-24 15:26 User-space code aborts on some (but not all) misaligned accesses Mason
2017-05-24 15:45 ` Robin Murphy
2017-05-24 16:56   ` Mason
2017-05-24 17:25     ` Robin Murphy
2017-05-24 21:19       ` Mason
2017-05-24 17:27     ` Ard Biesheuvel
2017-05-24 17:36       ` Robin Murphy
2017-05-24 17:40         ` Ard Biesheuvel
2017-05-24 22:15       ` Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.