All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
@ 2017-11-13 19:59 Peter Maydell
  2017-11-13 22:53 ` Philippe Mathieu-Daudé
  2017-11-13 23:31 ` Richard Henderson
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Maydell @ 2017-11-13 19:59 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Richard Henderson, Alex Bennée, Emilio G. Cota

I've been investigating a bug (a javac crash). I'm not sure if it's
the root cause, but I can't figure out how, if we get a guest SEGV in
an atomic helper we report the right faulting PC to the guest.

Specifically, if you get a SEGV here:

#0  0x000000006003c22b in helper_atomic_cmpxchgl_le (env=0x63caf680,
    addr=275041819628, cmpv=0, newv=1)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/atomic_template.h:65
#1  0x0000000061002f61 in static_code_gen_buffer ()
#2  0x0000000060035d6b in cpu_tb_exec (cpu=0x63ca73e0,
    itb=0x6119d000 <static_code_gen_buffer+9080960>)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:167
#3  0x0000000060036945 in cpu_loop_exec_tb (cpu=0x63ca73e0,
    tb=0x6119d000 <static_code_gen_buffer+9080960>, last_tb=0x7f01b213dbd8,
    tb_exit=0x7f01b213dbd0)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:611
#4  0x0000000060036bc2 in cpu_exec (cpu=0x63ca73e0)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:723
#5  0x000000006003da13 in cpu_loop (env=0x63caf680)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/main.c:809
#6  0x000000006004c627 in clone_func (arg=0x7ffe028f0a10)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/syscall.c:6241
#7  0x00000000602fcc25 in start_thread (arg=0x7f01b213e700)
    at pthread_create.c:333
#8  0x00000000603949a9 in clone ()

then the code in handle_cpu_signal() is passed a pc of 0x6003c22b
(the location in the helper function that does the memory access).
This is outside generated code, so the call to cpu_restore_state()
in handle_cpu_signal() will do nothing. However as far as I can tell,
there isn't any syncing of the PC etc state to the CPU before calling
this helper (at least, env->pc is completely wrong for the insn that
I think is causing this helper call).

Am I misreading my debugger entrails (entirely possible)? How is this
code intended to get the right guest PC for segfaults in these helpers?

(I'll investigate further tomorrow, but since it's end-of-day for me
I figured I'd throw this question out before going home...)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
  2017-11-13 19:59 [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode? Peter Maydell
@ 2017-11-13 22:53 ` Philippe Mathieu-Daudé
  2017-11-13 23:31 ` Richard Henderson
  1 sibling, 0 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-11-13 22:53 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers
  Cc: Emilio G. Cota, Alex Bennée, Richard Henderson

Hi Peter,

On 11/13/2017 04:59 PM, Peter Maydell wrote:
> I've been investigating a bug (a javac crash). I'm not sure if it's
> the root cause, but I can't figure out how, if we get a guest SEGV in
> an atomic helper we report the right faulting PC to the guest.
> 
> Specifically, if you get a SEGV here:
> 
> #0  0x000000006003c22b in helper_atomic_cmpxchgl_le (env=0x63caf680,
>     addr=275041819628, cmpv=0, newv=1)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/atomic_template.h:65
> #1  0x0000000061002f61 in static_code_gen_buffer ()
> #2  0x0000000060035d6b in cpu_tb_exec (cpu=0x63ca73e0,
>     itb=0x6119d000 <static_code_gen_buffer+9080960>)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:167
> #3  0x0000000060036945 in cpu_loop_exec_tb (cpu=0x63ca73e0,
>     tb=0x6119d000 <static_code_gen_buffer+9080960>, last_tb=0x7f01b213dbd8,
>     tb_exit=0x7f01b213dbd0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:611
> #4  0x0000000060036bc2 in cpu_exec (cpu=0x63ca73e0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:723
> #5  0x000000006003da13 in cpu_loop (env=0x63caf680)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/main.c:809
> #6  0x000000006004c627 in clone_func (arg=0x7ffe028f0a10)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/syscall.c:6241
> #7  0x00000000602fcc25 in start_thread (arg=0x7f01b213e700)
>     at pthread_create.c:333
> #8  0x00000000603949a9 in clone ()
> 
> then the code in handle_cpu_signal() is passed a pc of 0x6003c22b
> (the location in the helper function that does the memory access).
> This is outside generated code, so the call to cpu_restore_state()
> in handle_cpu_signal() will do nothing. However as far as I can tell,
> there isn't any syncing of the PC etc state to the CPU before calling
> this helper (at least, env->pc is completely wrong for the insn that
> I think is causing this helper call).

I'm not sure this is related, but last week I hit a similar problem when
my laptop ran Out Of Memory using the xlnx-zcu102 machine; but I wasn't
getting a SEGV but various SIGBUS in different places, and my backtraces
don't show atomic_template.h but softmmu_template.h.

I found it is easier to understand a such OOM using the -mem-prealloc
option, I now get a more OOM-related error:

Thread 12 "qemu-system-aar" received signal SIGBUS, Bus error.
0x0000555555de1bd4 in do_touch_pages (arg=0x555556cc0210) at
util/oslib-posix.c:331
331            *(volatile char *)addr = *addr;

Regards,

Phil.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
  2017-11-13 19:59 [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode? Peter Maydell
  2017-11-13 22:53 ` Philippe Mathieu-Daudé
@ 2017-11-13 23:31 ` Richard Henderson
  2017-11-14  8:52   ` Peter Maydell
  2017-11-14 12:23   ` Alex Bennée
  1 sibling, 2 replies; 6+ messages in thread
From: Richard Henderson @ 2017-11-13 23:31 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: Alex Bennée, Emilio G. Cota

On 11/13/2017 08:59 PM, Peter Maydell wrote:
> I've been investigating a bug (a javac crash). I'm not sure if it's
> the root cause, but I can't figure out how, if we get a guest SEGV in
> an atomic helper we report the right faulting PC to the guest.
> 
> Specifically, if you get a SEGV here:
> 
> #0  0x000000006003c22b in helper_atomic_cmpxchgl_le (env=0x63caf680,
>     addr=275041819628, cmpv=0, newv=1)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/atomic_template.h:65
> #1  0x0000000061002f61 in static_code_gen_buffer ()
> #2  0x0000000060035d6b in cpu_tb_exec (cpu=0x63ca73e0,
>     itb=0x6119d000 <static_code_gen_buffer+9080960>)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:167
> #3  0x0000000060036945 in cpu_loop_exec_tb (cpu=0x63ca73e0,
>     tb=0x6119d000 <static_code_gen_buffer+9080960>, last_tb=0x7f01b213dbd8,
>     tb_exit=0x7f01b213dbd0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:611
> #4  0x0000000060036bc2 in cpu_exec (cpu=0x63ca73e0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:723
> #5  0x000000006003da13 in cpu_loop (env=0x63caf680)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/main.c:809
> #6  0x000000006004c627 in clone_func (arg=0x7ffe028f0a10)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/syscall.c:6241
> #7  0x00000000602fcc25 in start_thread (arg=0x7f01b213e700)
>     at pthread_create.c:333
> #8  0x00000000603949a9 in clone ()
> 
> then the code in handle_cpu_signal() is passed a pc of 0x6003c22b
> (the location in the helper function that does the memory access).
> This is outside generated code, so the call to cpu_restore_state()
> in handle_cpu_signal() will do nothing. However as far as I can tell,
> there isn't any syncing of the PC etc state to the CPU before calling
> this helper (at least, env->pc is completely wrong for the insn that
> I think is causing this helper call).
> 
> Am I misreading my debugger entrails (entirely possible)? How is this
> code intended to get the right guest PC for segfaults in these helpers?

It looks like we can't.

We get it right for system mode, but not linux-user.

I suppose it would be fixable with a tls variable that is set by the helper,
which could then be used by the signal handler in preference to the host pc
indicated by the frame.  That seems kinda kludgy.

Looking forward, we're going to need a way to catch these faults for the SVE
FFR.  Perhaps a tls pointer to a jmp_buf can do both.  If the pointer is
non-null, host_signal_handler does nothing for SIGSEGV, SIGBUS.  For SVE, we
end the first-faulting load sequence.  For atomic linux-user, we call into a
version of handle_cpu_signal with the proper code_gen_buffer return address.

Thoughts?

r~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
  2017-11-13 23:31 ` Richard Henderson
@ 2017-11-14  8:52   ` Peter Maydell
  2017-11-14  9:06     ` Richard Henderson
  2017-11-14 12:23   ` Alex Bennée
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2017-11-14  8:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Alex Bennée, Emilio G. Cota

On 13 November 2017 at 23:31, Richard Henderson <rth@twiddle.net> wrote:
> On 11/13/2017 08:59 PM, Peter Maydell wrote:
>> Am I misreading my debugger entrails (entirely possible)? How is this
>> code intended to get the right guest PC for segfaults in these helpers?
>
> It looks like we can't.
>
> We get it right for system mode, but not linux-user.

How does it work for system mode?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
  2017-11-14  8:52   ` Peter Maydell
@ 2017-11-14  9:06     ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2017-11-14  9:06 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Alex Bennée, Emilio G. Cota

On 11/14/2017 09:52 AM, Peter Maydell wrote:
> On 13 November 2017 at 23:31, Richard Henderson <rth@twiddle.net> wrote:
>> On 11/13/2017 08:59 PM, Peter Maydell wrote:
>>> Am I misreading my debugger entrails (entirely possible)? How is this
>>> code intended to get the right guest PC for segfaults in these helpers?
>>
>> It looks like we can't.
>>
>> We get it right for system mode, but not linux-user.
> 
> How does it work for system mode?

We have retaddr from GETPC which we pass down through tlb_fill and friends,
which means the correct pc is used for restore state.

What's different about user-mode is that we don't have tlb_fill or equivalent,
and we rely on the pc from the signal handler.  Which leads to the bogusness
that you see.

I've just about got a patch together that uses a TLS variable for retaddr.  It
is the smaller change than setjmp for soft freeze.


r~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode?
  2017-11-13 23:31 ` Richard Henderson
  2017-11-14  8:52   ` Peter Maydell
@ 2017-11-14 12:23   ` Alex Bennée
  1 sibling, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2017-11-14 12:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers, Emilio G. Cota


Richard Henderson <rth@twiddle.net> writes:

> On 11/13/2017 08:59 PM, Peter Maydell wrote:
>> I've been investigating a bug (a javac crash). I'm not sure if it's
>> the root cause, but I can't figure out how, if we get a guest SEGV in
>> an atomic helper we report the right faulting PC to the guest.
>>
>> Specifically, if you get a SEGV here:
>>
>> #0  0x000000006003c22b in helper_atomic_cmpxchgl_le (env=0x63caf680,
>>     addr=275041819628, cmpv=0, newv=1)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/atomic_template.h:65
>> #1  0x0000000061002f61 in static_code_gen_buffer ()
>> #2  0x0000000060035d6b in cpu_tb_exec (cpu=0x63ca73e0,
>>     itb=0x6119d000 <static_code_gen_buffer+9080960>)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:167
>> #3  0x0000000060036945 in cpu_loop_exec_tb (cpu=0x63ca73e0,
>>     tb=0x6119d000 <static_code_gen_buffer+9080960>, last_tb=0x7f01b213dbd8,
>>     tb_exit=0x7f01b213dbd0)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:611
>> #4  0x0000000060036bc2 in cpu_exec (cpu=0x63ca73e0)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/accel/tcg/cpu-exec.c:723
>> #5  0x000000006003da13 in cpu_loop (env=0x63caf680)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/main.c:809
>> #6  0x000000006004c627 in clone_func (arg=0x7ffe028f0a10)
>>     at /home/petmay01/linaro/qemu-from-laptop/qemu/linux-user/syscall.c:6241
>> #7  0x00000000602fcc25 in start_thread (arg=0x7f01b213e700)
>>     at pthread_create.c:333
>> #8  0x00000000603949a9 in clone ()
>>
>> then the code in handle_cpu_signal() is passed a pc of 0x6003c22b
>> (the location in the helper function that does the memory access).
>> This is outside generated code, so the call to cpu_restore_state()
>> in handle_cpu_signal() will do nothing. However as far as I can tell,
>> there isn't any syncing of the PC etc state to the CPU before calling
>> this helper (at least, env->pc is completely wrong for the insn that
>> I think is causing this helper call).
>>
>> Am I misreading my debugger entrails (entirely possible)? How is this
>> code intended to get the right guest PC for segfaults in these helpers?
>
> It looks like we can't.

I thought the GETPC() macro was a host specific way to find the return
address, hence the address in the TB that can be resolved?

>
> We get it right for system mode, but not linux-user.
>
> I suppose it would be fixable with a tls variable that is set by the helper,
> which could then be used by the signal handler in preference to the host pc
> indicated by the frame.  That seems kinda kludgy.
>
> Looking forward, we're going to need a way to catch these faults for the SVE
> FFR.  Perhaps a tls pointer to a jmp_buf can do both.  If the pointer is
> non-null, host_signal_handler does nothing for SIGSEGV, SIGBUS.  For SVE, we
> end the first-faulting load sequence.  For atomic linux-user, we call into a
> version of handle_cpu_signal with the proper code_gen_buffer return address.
>
> Thoughts?
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-14 12:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-13 19:59 [Qemu-devel] how do we determine correct guest PC for segfaults in atomic helpers for linux-user mode? Peter Maydell
2017-11-13 22:53 ` Philippe Mathieu-Daudé
2017-11-13 23:31 ` Richard Henderson
2017-11-14  8:52   ` Peter Maydell
2017-11-14  9:06     ` Richard Henderson
2017-11-14 12:23   ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.