All of lore.kernel.org
 help / color / mirror / Atom feed
* tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
@ 2020-04-20 10:29 Szabolcs Nagy
  2020-04-22  4:39 ` Richard Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2020-04-20 10:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: nd, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

i'm using the branch at

https://github.com/rth7680/qemu/tree/tgt-arm-mte

to test armv8.5-a mte and hope this is ok to report bugs here.

i'm doing tests in qemu-system-aarch64 with linux userspace
code and it seems TCO bit gets cleared after syscalls or other
kernel entry, but PSTATE is expected to be restored, so i
suspect it is a qemu bug.

i think the architecture saves/restores PSTATE using SPSR_ELx
on exceptions.

i used the linux branch
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=devel/mte-v2

attached a reproducer that segfaults in qemu but should work.

thanks.

[-- Attachment #2: bug.c --]
[-- Type: text/x-csrc, Size: 1216 bytes --]

// CFLAGS = -march=armv8.5-a+memtag
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>

#define TAG_SHIFT 56

#ifndef PROT_MTE
#define PROT_MTE 0x20
#endif
#ifndef PR_SET_TAGGED_ADDR_CTRL
#define PR_SET_TAGGED_ADDR_CTRL 55
#define PR_GET_TAGGED_ADDR_CTRL 56
#define PR_TAGGED_ADDR_ENABLE 1UL
#endif
#ifndef PR_MTE_TCF_SYNC
#define PR_MTE_TCF_SYNC 2UL
#define PR_MTE_TAG_SHIFT 3
#endif

int main()
{
	if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE|PR_MTE_TCF_SYNC|(0xffff << PR_MTE_TAG_SHIFT), 0, 0, 0))
		abort();

	unsigned long *a = mmap(0, 1<<12, PROT_READ|PROT_WRITE|PROT_MTE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	if (a == MAP_FAILED)
		abort();

	// tag ptr a
	a = (void*)((unsigned long)a|(1UL<<TAG_SHIFT));

	// tag memory a[0], a[1]
	asm volatile ("stg %1, %0" : "=Q"(*a) : "r"(a));

	// turn tag checks off
	asm volatile ("msr tco, 1");

	a[0]=1; // ok
	a[1]=2; // ok
	a[2]=3; // tag mismatch but tco==1 so ok

	write(1, "foo\n", 4);

	// PSTATE.TCO (bit 25) should be still set after the syscall
	unsigned long x;
	asm volatile ("mrs %0, tco" : "=r"(x));
	printf("tco = 0x%lx\n", x);

	a[3]=4; // tag mismatch, segfaults if tco==0
	return 0;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-04-20 10:29 tst-arm-mte bug: PSTATE.TCO is cleared on exceptions Szabolcs Nagy
@ 2020-04-22  4:39 ` Richard Henderson
  2020-04-24 19:47   ` Richard Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2020-04-22  4:39 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: nd, qemu-devel

On 4/20/20 3:29 AM, Szabolcs Nagy wrote:
> i'm using the branch at
> 
> https://github.com/rth7680/qemu/tree/tgt-arm-mte
> 
> to test armv8.5-a mte and hope this is ok to report bugs here.
> 
> i'm doing tests in qemu-system-aarch64 with linux userspace
> code and it seems TCO bit gets cleared after syscalls or other
> kernel entry, but PSTATE is expected to be restored, so i
> suspect it is a qemu bug.
> 
> i think the architecture saves/restores PSTATE using SPSR_ELx
> on exceptions.

Yep.  I failed to update aarch64_pstate_valid_mask for TCO.
Will fix.  Thanks,


r~

> 
> i used the linux branch
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=devel/mte-v2
> 
> attached a reproducer that segfaults in qemu but should work.
> 
> thanks.
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-04-22  4:39 ` Richard Henderson
@ 2020-04-24 19:47   ` Richard Henderson
  2020-05-06 12:57     ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2020-04-24 19:47 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: nd, qemu-devel

On 4/21/20 9:39 PM, Richard Henderson wrote:
> On 4/20/20 3:29 AM, Szabolcs Nagy wrote:
>> i'm using the branch at
>>
>> https://github.com/rth7680/qemu/tree/tgt-arm-mte
>>
>> to test armv8.5-a mte and hope this is ok to report bugs here.
>>
>> i'm doing tests in qemu-system-aarch64 with linux userspace
>> code and it seems TCO bit gets cleared after syscalls or other
>> kernel entry, but PSTATE is expected to be restored, so i
>> suspect it is a qemu bug.
>>
>> i think the architecture saves/restores PSTATE using SPSR_ELx
>> on exceptions.
> 
> Yep.  I failed to update aarch64_pstate_valid_mask for TCO.
> Will fix.  Thanks,

Fixed on the branch.

I still need to work out how best to plumb the arm,armv8.5-memtag property so
the devel/mte-v3 kernel branch isn't usable as-is for the moment.  For myself,
I've just commented that test out for now.


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-04-24 19:47   ` Richard Henderson
@ 2020-05-06 12:57     ` Szabolcs Nagy
  2020-05-07  9:59       ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2020-05-06 12:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: nd, qemu-devel

The 04/24/2020 12:47, Richard Henderson wrote:
> On 4/21/20 9:39 PM, Richard Henderson wrote:
> > Yep.  I failed to update aarch64_pstate_valid_mask for TCO.
> > Will fix.  Thanks,
> 
> Fixed on the branch.
> 
> I still need to work out how best to plumb the arm,armv8.5-memtag property so
> the devel/mte-v3 kernel branch isn't usable as-is for the moment.  For myself,
> I've just commented that test out for now.

The fix worked well thanks (in linux devel/mte-v3 i
reverted the patch that introduced arm,armv8.5-memtag)

However later on during testing malloc with PROT_MTE
i got a qemu assert failure:

Bail out! ERROR:/S/target/arm/mte_helper.c:97:allocation_tag_mem: assertion failed: (tag_size <= in_page)

i can reproduce it, but i don't know how to debug it
further, i don't know what the application is doing
when this happens, nor what the kernel is doing.

i rebuilt qemu with --enable-debug but now it's very
slow (still booting into linux 3h later).

let me know if there are ways to narrow this down.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-05-06 12:57     ` Szabolcs Nagy
@ 2020-05-07  9:59       ` Szabolcs Nagy
  2020-05-07 17:21         ` Richard Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2020-05-07  9:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: nd, qemu-devel

The 05/06/2020 13:57, Szabolcs Nagy wrote:
> However later on during testing malloc with PROT_MTE
> i got a qemu assert failure:
> 
> Bail out! ERROR:/S/target/arm/mte_helper.c:97:allocation_tag_mem: assertion failed: (tag_size <= in_page)
> 
> i can reproduce it, but i don't know how to debug it
> further, i don't know what the application is doing
> when this happens, nor what the kernel is doing.

actually i know what the application is doing,
it's in an mmap when qemu aborts:

...
23:15:17.379227 munmap(0x100ffff9675a000, 8192) = 0
23:15:17.428456 mmap(NULL, 8192, PROT_READ|PROT_WRITE|0x20, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9675a000
23:15:17.502543 mmap(NULL, 36864, PROT_READ|PROT_WRITE|0x20, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff96707000
23:15:17.572469 munmap(0x100ffff96707000, 36864) = 0
23:15:17.645050 munmap(0x100ffff9675a000, 8192) = 0
23:15:17.721526 mmap(NULL, 8192, PROT_READ|PROT_WRITE|0x20, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9675a000
23:15:17.779768 mmap(NULL, 36864, PROT_READ|PROT_WRITE|0x20, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff96707000
23:15:17.840278 newfstatat(3, "usr/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
23:15:18.164292 unlinkat(3, "usr/lib/.apk.1e1bebb420b720c23f29fc2cacd5581b598339805fd12c00", 0) = 0
23:15:18.357742 symlinkat("libXau.so.6.0.0", 3, "usr/lib/.apk.1e1bebb420b720c23f29fc2cacd5581b598339805fd12c00") = 0
23:15:18.469921 fchownat(3, "usr/lib/.apk.1e1bebb420b720c23f29fc2cacd5581b598339805fd12c00", 0, 0, AT_SYMLINK_NOFOLLOW) = 0
23:15:18.638698 unlinkat(3, "usr/lib/.apk.93d31976aebb056b6e2d9577dc8a2f112e28756d03f736a4", 0) = 0
23:15:18.760374 openat(3, "usr/lib/.apk.93d31976aebb056b6e2d9577dc8a2f112e28756d03f736a4", O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0755) = 8
23:15:18.916049 write(8, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\300\r\0\0\0\0\0\0@\0\0\0\0\0\0\0\3700\0\0\0\0\0\0\0\0\0\0@\08\0\6\0@\0\26\0\25\0\1\0\0\0\5\0"..., 13944) = 13944
23:15:18.961239 close(8)                = 0
23:15:20.137627 fchownat(3, "usr/lib/.apk.93d31976aebb056b6e2d9577dc8a2f112e28756d03f736a4", 0, 0, 0) = 0
23:15:20.289924 utimensat(3, "usr/lib/.apk.93d31976aebb056b6e2d9577dc8a2f112e28756d03f736a4", [{tv_sec=1579395233, tv_nsec=0} /* 2020-01-19T00:53:53+0000 */, {tv_sec=1579395233, tv_nsec=0} /* 2020-01-19T00:53:53+0000 */], 0) = 0
23:15:20.467212 munmap(0x100ffff96707000, 36864) = 0
23:15:20.503631 munmap(0x100ffff9675a000, 8192) = 0
23:15:20.550130 mmap(NULL, 8192, PROT_READ|PROT_WRITE|0x20, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0Connection to localhost closed by remote host.

(this allocator does a lot of small mmap and munmap)

but i cant tell what happens on the kernel side.

is there some recommended way to turn some form
of tracing on in qemu before i execute the
problematic application?

or is it better if i try to extract a reproducer?
(that does not use the network)

> 
> i rebuilt qemu with --enable-debug but now it's very
> slow (still booting into linux 3h later).

this is too slow, things time out.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-05-07  9:59       ` Szabolcs Nagy
@ 2020-05-07 17:21         ` Richard Henderson
  2020-05-18 12:59           ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2020-05-07 17:21 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: nd, qemu-devel

On 5/7/20 2:59 AM, Szabolcs Nagy wrote:
> is there some recommended way to turn some form
> of tracing on in qemu before i execute the
> problematic application?

I didn't add any tracing within mte.  I can do so if we can guess what we're
looking for.

> or is it better if i try to extract a reproducer?
> (that does not use the network)

A reproducer would be most helpful.

Something that can help is saving a VM snapshot with the kernel booted and the
user logged in, just ready to run the test program.  Then you can get back to
exactly the state you want before things go wrong, even with a different qemu
build.


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-05-07 17:21         ` Richard Henderson
@ 2020-05-18 12:59           ` Szabolcs Nagy
  2020-05-19 18:46             ` Richard Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2020-05-18 12:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: nd, qemu-devel

The 05/07/2020 10:21, Richard Henderson wrote:
> A reproducer would be most helpful.
> 
> Something that can help is saving a VM snapshot with the kernel booted and the
> user logged in, just ready to run the test program.  Then you can get back to
> exactly the state you want before things go wrong, even with a different qemu
> build.

i got some time to create a reproducer (with public code),
temporarily hosting the binaries at

http://port70.net/~nsz/tmp/qemu-bug.tar.gz
~251M

here

echo ./bug.sh | ./qemu-bug.sh

crashes in about 1 minute (where qemu-bug.sh
loads a snapshot with root shell and ./bug.sh
triggers the bug)

the disk rootfs is based on
https://distfiles.adelielinux.org/adelie/1.0/iso/rc1/adelie-rootfs-aarch64-1.0-rc1-20200206.txz
the kernel Image is linux mte-v3 with reverting the commit
"arm64: mte: Check the DT memory nodes for MTE support"
qemu is static linked from the branch tgt-arm-mte.

the userspace workload that triggers the bug is using the
adelie linux package manager with a malloc with tagging.
(the malloc implementation is a modified version of
https://github.com/richfelker/mallocng-draft
the code is on the disk image, it has known issues, but
it should not crash qemu)

i will remove the file after a few days. hope this helps.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
  2020-05-18 12:59           ` Szabolcs Nagy
@ 2020-05-19 18:46             ` Richard Henderson
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2020-05-19 18:46 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: nd, qemu-devel

On 5/18/20 5:59 AM, Szabolcs Nagy wrote:
> i got some time to create a reproducer (with public code),

Thanks.  I've grabbed it.  I'll try it out soon.


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-19 18:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-20 10:29 tst-arm-mte bug: PSTATE.TCO is cleared on exceptions Szabolcs Nagy
2020-04-22  4:39 ` Richard Henderson
2020-04-24 19:47   ` Richard Henderson
2020-05-06 12:57     ` Szabolcs Nagy
2020-05-07  9:59       ` Szabolcs Nagy
2020-05-07 17:21         ` Richard Henderson
2020-05-18 12:59           ` Szabolcs Nagy
2020-05-19 18:46             ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.