All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG
       [not found] ` <CAHmME9qX22YP9qrHErc43Z+LUi=ichqzG+OdXqjhJv4ZrKDmWQ@mail.gmail.com>
@ 2018-07-22 20:47   ` Jason A. Donenfeld
  2018-07-22 21:31     ` Richard Henderson
  0 siblings, 1 reply; 3+ messages in thread
From: Jason A. Donenfeld @ 2018-07-22 20:47 UTC (permalink / raw)
  To: qemu-arm, QEMU Developers

Hello,

Gcc 7.3 compiles bash's array_flush's dual assignment using:

STP             X20, X20, [X20,#0x10]

But gcc 8.1 compiles it as:

STR             Q0, [X20,#0x10]

Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
results in a segfaulting process. I'm pretty sure this is a TCG bug.

In the attached tarball, please find kernel and run.sh. Calling
./run.sh will start the kernel with the bad bash executable that tries
to execute `config=({1..100000})` and crashes. Also included in there
is the actual crashing bash binary, in case you'd like to disassemble
a little bit.

This is affecting builds on https://www.wireguard.com/build-status/ --
as you can see, at the moment aarch64 is failing.

Regards,
Jason

[ attachment: https://data.zx2c4.com/bash-qemu-arm64-crash.tar.xz ]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG
  2018-07-22 20:47   ` [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG Jason A. Donenfeld
@ 2018-07-22 21:31     ` Richard Henderson
  2018-07-23  1:45       ` Richard Henderson
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Henderson @ 2018-07-22 21:31 UTC (permalink / raw)
  To: Jason A. Donenfeld, qemu-arm, QEMU Developers

On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
> Hello,
> 
> Gcc 7.3 compiles bash's array_flush's dual assignment using:
> 
> STP             X20, X20, [X20,#0x10]
> 
> But gcc 8.1 compiles it as:
> 
> STR             Q0, [X20,#0x10]
> 
> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
> results in a segfaulting process. I'm pretty sure this is a TCG bug.
> 
> In the attached tarball, please find kernel and run.sh. Calling
> ./run.sh will start the kernel with the bad bash executable that tries
> to execute `config=({1..100000})` and crashes. Also included in there
> is the actual crashing bash binary, in case you'd like to disassemble
> a little bit.

Interesting.  The test passes on master with --enable-debug, but fails when
qemu is compiled with optimization...

I'll dig a bit deeper.


r~

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG
  2018-07-22 21:31     ` Richard Henderson
@ 2018-07-23  1:45       ` Richard Henderson
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Henderson @ 2018-07-23  1:45 UTC (permalink / raw)
  To: Jason A. Donenfeld, qemu-arm, QEMU Developers

On 07/22/2018 02:31 PM, Richard Henderson wrote:
> On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
>> Hello,
>>
>> Gcc 7.3 compiles bash's array_flush's dual assignment using:
>>
>> STP             X20, X20, [X20,#0x10]
>>
>> But gcc 8.1 compiles it as:
>>
>> STR             Q0, [X20,#0x10]
>>
>> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
>> results in a segfaulting process. I'm pretty sure this is a TCG bug.
>>
>> In the attached tarball, please find kernel and run.sh. Calling
>> ./run.sh will start the kernel with the bad bash executable that tries
>> to execute `config=({1..100000})` and crashes. Also included in there
>> is the actual crashing bash binary, in case you'd like to disassemble
>> a little bit.
> 
> Interesting.  The test passes on master with --enable-debug, but fails when
> qemu is compiled with optimization...
> 
> I'll dig a bit deeper.

The failing sequence is

0x0045ba44:  4e080e80  dup      v0.2d, x20
0x0045ba48:  90000340  adrp     x0, #0x4c3000
0x0045ba4c:  91098003  add      x3, x0, #0x260
0x0045ba50:  92800001  movn     x1, #0
0x0045ba54:  f9413002  ldr      x2, [x0, #0x260]
0x0045ba58:  3d800680  str      q0, [x20, #0x10]
...

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xffffffffffffffdc              dead: 1
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0                      dead: 0 1

 ---- 000000000045ba44 0000000000000000 0000000000000000
 dup_vec v128,e64,tmp2,x20
 st_vec v128,e8,tmp2,env,$0x8c0                   dead: 0

...

 ---- 000000000045ba58 0000000000000000 0000000000000000
 movi_i64 tmp4,$0x10
 add_i64 tmp3,x20,tmp4                            dead: 1 2
 ld_i64 tmp4,env,$0x8c0
 movi_i64 tmp6,$0x8
 add_i64 tmp5,tmp3,tmp6                           dead: 2
 qemu_st_i64 tmp4,tmp3,leq,0                      dead: 0 1
 ld_i64 tmp4,env,$0x8c8                           dead: 1
 qemu_st_i64 tmp4,tmp5,leq,0                      dead: 0 1
...

0x7fffcd2e678c:  vmovq    0xe0(%r14), %xmm0
0x7fffcd2e6795:  vpbroadcastq %xmm0, %xmm1
0x7fffcd2e679a:  vmovdqu  %xmm1, 0x8c0(%r14)
...
0x7fffcd2c0e78:  vmovq    %xmm0, %r12
0x7fffcd2c0e7d:  addq     $0x10, %r12


The guest x20 is loaded in to xmm0 for the dup at 0x45ba44, and was reused for
the store at 0x45ba58.  However, if the load at 0x45ba54 misses the TLB, then
we will have a function call, which can clobber xmm0.

With -O0, it just so happens that the function call does not clobber xmm0; with
optimization enabled, the compiler's different code generation does clobber xmm0.

Fix by properly considering xmm registers to be call-clobbered.  At which point
the saved value is evicted from xmm0 naturally.  Patch posted separately.


r~

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-07-23  1:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHmME9q8b0Nss8b7DEoGgqoCK4dEvasQN64QWx6Hio+N92wuSg@mail.gmail.com>
     [not found] ` <CAHmME9qX22YP9qrHErc43Z+LUi=ichqzG+OdXqjhJv4ZrKDmWQ@mail.gmail.com>
2018-07-22 20:47   ` [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG Jason A. Donenfeld
2018-07-22 21:31     ` Richard Henderson
2018-07-23  1:45       ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.