linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
@ 2019-11-12 23:22 Luis Machado
  2019-11-18 13:15 ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Machado @ 2019-11-12 23:22 UTC (permalink / raw)
  To: linux-arm-kernel, will

Hi,

I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP 
request by GDB won't execute the underlying instruction. As a 
consequence, the PC doesn't move, but we return a SIGTRAP just like we 
would for a regular successful PTRACE_SINGLESTEP request.

Since there are no software breakpoints inserted at PC (we are actually 
stepping over a breakpoint, so GDB removes the breakpoint at PC before 
issuing a PTRACE_SINGLESTEP request), this is an odd behavior.

Though not too harmful, i see this manifesting in the GDB testsuite 
(gdb.reverse/insn-reverse.exp), which throws the test off by making GDB 
think it is further in the instruction stream than it really is. In 
fact, we get lucky here and no FAIL's show up, only many more spurious 
PASSes.

Since the reproduction steps involve GDB and the testcase, I'll report 
my findings here for convenience. But it can be reproduced with a 
top-of-tree kernel (what i used) or an Ubuntu one (4.12.13), it doesn't 
make a difference. I've also reproduced this in real hardware and under 
QEMU.

I did some rudimentary debugging to confirm GDB wasn't doing anything 
wrong, and placed some debugging output on the arm64 ptrace-related 
functions in the kernel. I also added some debugging output to the 
function that handles software breakpoint traps, to make sure no 
breakpoints were being inadvertently left behind.

At the point where GDB issues PTRACE_SINGLESTEP, we see this:

<case 1>
<before execution>
[  524.329276] >>>> Start 
user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:450 
<<<<
[  524.329314] >>>> PC is 400574 <<<<
[  524.329329] >>>> End 
user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:459 
<<<<
<after execution>
[  524.329679] >>>> Start 
single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:249 <<<<
[  524.329707] >>>> PC is 400574 <<<<
[  524.329725] >>>> Start 
send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:228 <<<<
[  524.329733] >>>> PC is 400574 <<<<
[  524.329783] >>>> End 
send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:241 <<<<
[  524.329794] >>>> End 
single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:280 <<<<

A regular successful PTRACE_SINGLESTEP should look like this instead:

<case 2>
<before execution>
[  981.042942] >>>> Start 
user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:450 
<<<<
[  981.042982] >>>> PC is 400574 <<<<
[  981.042997] >>>> End 
user_enable_single_step,/repos/linux/arch/arm64/kernel/debug-monitors.c:459 
<<<<
<after execution>
[  981.043411] >>>> Start 
single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:249 <<<<
[  981.043453] >>>> PC is 400578 <<<<
[  981.043472] >>>> Start 
send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:228 <<<<
[  981.043481] >>>> PC is 400578 <<<<
[  981.043540] >>>> End 
send_user_sigtrap,/repos/linux/arch/arm64/kernel/debug-monitors.c:241 <<<<
[  981.043553] >>>> End 
single_step_handler,/repos/linux/arch/arm64/kernel/debug-monitors.c:280 <<<<

As a guess, i decided to revert commit 
3a402a709500c5a3faca2111668c33d96555e35a (arm64: debug: avoid resetting 
stepping state machine when TIF_SINGLESTEP) to see its effect on this 
particular case. Then the output looks like <case 2> above, which is 
correct.

So this is at least partially caused by commit 
3a402a709500c5a3faca2111668c33d96555e35a, but i don't understand the 
full picture (involving the kernel) here. I know said commit is needed 
for other problematic cases in GDB (fork/vfork for example), but it 
might be having undesirable side effects here.

Here's how to reproduce. Make sure you have a reasonably new GDB (I 
reproduced it with Ubuntu's GDB 7.11.1-0ubuntu1~16.5). You can also 
build GDB from the git tree if you want. A standard aarch64-linux-gnu 
GDB will do.

Grab both of these source files for the testcase:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.reverse/insn-reverse.c;hb=HEAD
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.reverse/insn-reverse-aarch64.c;hb=HEAD

Build the testcase with: gcc -O0 -g3 -lm insn-reverse.c -o insn-reverse

Execute gdb like so:

gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" 
-ex "si" -ex "rsi" -ex "record stop" insn-reverse

What the above does is put a breakpoint in "load", run to it, enable 
reversible debugging, step one instruction forward, step back one 
instruction (essentially coming back to the same PC) and then shutting 
down reversible debugging.

Now, giving gdb the "si" command will cause it to execute the 
PTRACE_SINGLESTEP i pointed out above, in my explanation of the bug.

display/x $pc
stepi

You'll see, if it reproduces, the PC has not changed and the instruction 
has not executed. GDB will indicate a breakpoint hit, but this is bogus. 
It is due to the fact the PC didn't move, and GDB still has a breakpoint 
listed in this PC.

Please let me know if i can help with any other information in case any 
of the steps is not clear.

Thanks,
Luis

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-11-12 23:22 [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction Luis Machado
@ 2019-11-18 13:15 ` Will Deacon
  2019-11-18 14:54   ` Luis Machado
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2019-11-18 13:15 UTC (permalink / raw)
  To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel

Hi Luis,

[+Mark for the valid_user_regs() part]

On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
> request by GDB won't execute the underlying instruction. As a consequence,
> the PC doesn't move, but we return a SIGTRAP just like we would for a
> regular successful PTRACE_SINGLESTEP request.
> 
> Since there are no software breakpoints inserted at PC (we are actually
> stepping over a breakpoint, so GDB removes the breakpoint at PC before
> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
> 
> Though not too harmful, i see this manifesting in the GDB testsuite
> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
> think it is further in the instruction stream than it really is. In fact, we
> get lucky here and no FAIL's show up, only many more spurious PASSes.

I managed to reproduce this locally and I think I've figured out what's
going on, although I'm not sure that the kernel is the best place to fix
it.

Looking at the specific reproducer:

> Execute gdb like so:
> 
> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" -ex
> "si" -ex "rsi" -ex "record stop" insn-reverse

So we've got a couple of instructions as follows (it doesn't actually matter
what they are, so I've changed the LD1 in your binary for a NOP in order to
avoid confusion with the "load" label not actually pointing at a load):

	0x7b8:		mov	// "load"
	0x7bc:		nop

"b load" places a breakpoint at 0x7b8:

	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0

We run to a software breakpoint on "load" (the mov instruction). We take
the trap and try to execute the "si", which means we need to remove the
breakpoint while we step over it:

	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
	[...]
	ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0

This causes the kernel to arm the single-step state machine so that
MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running
an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the trap to
trigger, at which point gdb puts the breakpoint instruction back since the
step is complete:

	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0

This is where things start to go wrong. The "rsi" command attempts to
perform a reverse step, which means restoring the old state when we were
previously executing at 0x7b8. It starts by removing the breakpoint again,
since we've already hit that:

	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0

and then resets the CPU registers to their old values:

	(I don't know why it does this three times)
	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0

The problem with this is that we have moved the PC back to 0x7b8 but we have
also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen stepping
get disabled (this usually happens by PTRACE_CONT calling
user_disable_single_step()) which means that MDSCR_EL1.SS remains set to 1
and we're in the active-pending state! Consequently, we immediately take a
step exception if a step operation is attempted.

Now, we *could* consider hacking the TIF_SINGLESTEP check in
valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is active
but this is a user-visible change and may break things like stepping out of
signal handlers. I would prefer that GDB manages the SS bit explicitly in
this scenario, by setting it to 1 when restoring the old state in the
reverse step, a bit like when it disables the old breakpoint. You can
emulate this by doing:

	(gdb) set $cpsr |= (1<<21)

Thoughts?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-11-18 13:15 ` Will Deacon
@ 2019-11-18 14:54   ` Luis Machado
  2019-11-26 16:35     ` Luis Machado
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Machado @ 2019-11-18 14:54 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

Hi Will,

Thanks for the thorough explanation.

On 11/18/19 10:15 AM, Will Deacon wrote:
> Hi Luis,
> 
> [+Mark for the valid_user_regs() part]
> 
> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>> request by GDB won't execute the underlying instruction. As a consequence,
>> the PC doesn't move, but we return a SIGTRAP just like we would for a
>> regular successful PTRACE_SINGLESTEP request.
>>
>> Since there are no software breakpoints inserted at PC (we are actually
>> stepping over a breakpoint, so GDB removes the breakpoint at PC before
>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
>>
>> Though not too harmful, i see this manifesting in the GDB testsuite
>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
>> think it is further in the instruction stream than it really is. In fact, we
>> get lucky here and no FAIL's show up, only many more spurious PASSes.
> 
> I managed to reproduce this locally and I think I've figured out what's
> going on, although I'm not sure that the kernel is the best place to fix
> it.
> 
> Looking at the specific reproducer:
> 
>> Execute gdb like so:
>>
>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex "record" -ex
>> "si" -ex "rsi" -ex "record stop" insn-reverse
> 
> So we've got a couple of instructions as follows (it doesn't actually matter
> what they are, so I've changed the LD1 in your binary for a NOP in order to
> avoid confusion with the "load" label not actually pointing at a load):
> 
> 	0x7b8:		mov	// "load"
> 	0x7bc:		nop
> 
> "b load" places a breakpoint at 0x7b8:
> 
> 	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
> 
> We run to a software breakpoint on "load" (the mov instruction). We take
> the trap and try to execute the "si", which means we need to remove the
> breakpoint while we step over it:
> 
> 	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
> 	[...]
> 	ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0
> 
> This causes the kernel to arm the single-step state machine so that
> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running
> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the trap to
> trigger, at which point gdb puts the breakpoint instruction back since the
> step is complete:

So, just to confirm my understanding, we have a couple bits controlling 
single-stepping in the kernel, one in MDSCR_EL1 and another in SPSR_EL1. 
GDB doesn't have direct access to any of those, correct?

Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.

The transition from active-not-pending to active-pending takes place via 
a single PTRACE_SINGLESTEP request? Is that correct?

> 
> 	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
> 
> This is where things start to go wrong. The "rsi" command attempts to
> perform a reverse step, which means restoring the old state when we were
> previously executing at 0x7b8. It starts by removing the breakpoint again,
> since we've already hit that:
> 
> 	ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
> 
> and then resets the CPU registers to their old values:
> 
> 	(I don't know why it does this three times)
> 	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
> 	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
> 	ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
> 
> The problem with this is that we have moved the PC back to 0x7b8 but we have
> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen stepping
> get disabled (this usually happens by PTRACE_CONT calling
> user_disable_single_step()) which means that MDSCR_EL1.SS remains set to 1
> and we're in the active-pending state! Consequently, we immediately take a
> step exception if a step operation is attempted >

While trying to reproduce this, i was paying attention to the SS bit 
coming and going. But in the particular sequence of si/rsi, within the 
record boundaries, i see GDB just restored the original CPSR value to 
what it was before we processed the si command.

 From GDB's POV all state was restore to the way it was before and we're 
good to go.

Is this not enough to restore state kernel-wise?

> Now, we *could* consider hacking the TIF_SINGLESTEP check in
> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is active
> but this is a user-visible change and may break things like stepping out of
> signal handlers. I would prefer that GDB manages the SS bit explicitly in
> this scenario, by setting it to 1 when restoring the old state in the
> reverse step, a bit like when it disables the old breakpoint. You can
> emulate this by doing:

I think we could let GDB control this when required, but I'm trying to 
understand the ramifications of letting GDB do so.

For example, what if the user decides to alter the PC here and there, 
for debugging purposes. That is a use case that happens often, in order 
to go back or skip some parts of the code.

Would we need to pay attention to the SS bit in those cases as well?

> 
> 	(gdb) set $cpsr |= (1<<21)

In particular, what does the switching of this bit accomplishes in the 
kernel? Would we be better off forcing the SS bit every time we do a 
single-step operation, for example?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-11-18 14:54   ` Luis Machado
@ 2019-11-26 16:35     ` Luis Machado
  2019-12-10 20:00       ` Luis Machado
  2020-01-13 18:13       ` Luis Machado
  0 siblings, 2 replies; 17+ messages in thread
From: Luis Machado @ 2019-11-26 16:35 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

ping?

On 11/18/19 11:54 AM, Luis Machado wrote:
> Hi Will,
> 
> Thanks for the thorough explanation.
> 
> On 11/18/19 10:15 AM, Will Deacon wrote:
>> Hi Luis,
>>
>> [+Mark for the valid_user_regs() part]
>>
>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>>> request by GDB won't execute the underlying instruction. As a 
>>> consequence,
>>> the PC doesn't move, but we return a SIGTRAP just like we would for a
>>> regular successful PTRACE_SINGLESTEP request.
>>>
>>> Since there are no software breakpoints inserted at PC (we are actually
>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before
>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
>>>
>>> Though not too harmful, i see this manifesting in the GDB testsuite
>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
>>> think it is further in the instruction stream than it really is. In 
>>> fact, we
>>> get lucky here and no FAIL's show up, only many more spurious PASSes.
>>
>> I managed to reproduce this locally and I think I've figured out what's
>> going on, although I'm not sure that the kernel is the best place to fix
>> it.
>>
>> Looking at the specific reproducer:
>>
>>> Execute gdb like so:
>>>
>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex 
>>> "record" -ex
>>> "si" -ex "rsi" -ex "record stop" insn-reverse
>>
>> So we've got a couple of instructions as follows (it doesn't actually 
>> matter
>> what they are, so I've changed the LD1 in your binary for a NOP in 
>> order to
>> avoid confusion with the "load" label not actually pointing at a load):
>>
>>     0x7b8:        mov    // "load"
>>     0x7bc:        nop
>>
>> "b load" places a breakpoint at 0x7b8:
>>
>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>
>> We run to a software breakpoint on "load" (the mov instruction). We take
>> the trap and try to execute the "si", which means we need to remove the
>> breakpoint while we step over it:
>>
>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>     [...]
>>     ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0
>>
>> This causes the kernel to arm the single-step state machine so that
>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). Running
>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the 
>> trap to
>> trigger, at which point gdb puts the breakpoint instruction back since 
>> the
>> step is complete:
> 
> So, just to confirm my understanding, we have a couple bits controlling 
> single-stepping in the kernel, one in MDSCR_EL1 and another in SPSR_EL1. 
> GDB doesn't have direct access to any of those, correct?
> 
> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.
> 
> The transition from active-not-pending to active-pending takes place via 
> a single PTRACE_SINGLESTEP request? Is that correct?
> 
>>
>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>
>> This is where things start to go wrong. The "rsi" command attempts to
>> perform a reverse step, which means restoring the old state when we were
>> previously executing at 0x7b8. It starts by removing the breakpoint 
>> again,
>> since we've already hit that:
>>
>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>
>> and then resets the CPU registers to their old values:
>>
>>     (I don't know why it does this three times)
>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>
>> The problem with this is that we have moved the PC back to 0x7b8 but 
>> we have
>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen 
>> stepping
>> get disabled (this usually happens by PTRACE_CONT calling
>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set 
>> to 1
>> and we're in the active-pending state! Consequently, we immediately 
>> take a
>> step exception if a step operation is attempted >
> 
> While trying to reproduce this, i was paying attention to the SS bit 
> coming and going. But in the particular sequence of si/rsi, within the 
> record boundaries, i see GDB just restored the original CPSR value to 
> what it was before we processed the si command.
> 
>  From GDB's POV all state was restore to the way it was before and we're 
> good to go.
> 
> Is this not enough to restore state kernel-wise?
> 
>> Now, we *could* consider hacking the TIF_SINGLESTEP check in
>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is 
>> active
>> but this is a user-visible change and may break things like stepping 
>> out of
>> signal handlers. I would prefer that GDB manages the SS bit explicitly in
>> this scenario, by setting it to 1 when restoring the old state in the
>> reverse step, a bit like when it disables the old breakpoint. You can
>> emulate this by doing:
> 
> I think we could let GDB control this when required, but I'm trying to 
> understand the ramifications of letting GDB do so.
> 
> For example, what if the user decides to alter the PC here and there, 
> for debugging purposes. That is a use case that happens often, in order 
> to go back or skip some parts of the code.
> 
> Would we need to pay attention to the SS bit in those cases as well?
> 
>>
>>     (gdb) set $cpsr |= (1<<21)
> 
> In particular, what does the switching of this bit accomplishes in the 
> kernel? Would we be better off forcing the SS bit every time we do a 
> single-step operation, for example?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-11-26 16:35     ` Luis Machado
@ 2019-12-10 20:00       ` Luis Machado
  2020-02-13 12:01         ` Will Deacon
  2020-01-13 18:13       ` Luis Machado
  1 sibling, 1 reply; 17+ messages in thread
From: Luis Machado @ 2019-12-10 20:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

Will, Mark,

Do you have any input regarding this particular situation?

It would be nice to get this fixed before the release of another GDB 
version, if the fix is to live in GDB itself.

On 11/26/19 1:35 PM, Luis Machado wrote:
> ping?
> 
> On 11/18/19 11:54 AM, Luis Machado wrote:
>> Hi Will,
>>
>> Thanks for the thorough explanation.
>>
>> On 11/18/19 10:15 AM, Will Deacon wrote:
>>> Hi Luis,
>>>
>>> [+Mark for the valid_user_regs() part]
>>>
>>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
>>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>>>> request by GDB won't execute the underlying instruction. As a 
>>>> consequence,
>>>> the PC doesn't move, but we return a SIGTRAP just like we would for a
>>>> regular successful PTRACE_SINGLESTEP request.
>>>>
>>>> Since there are no software breakpoints inserted at PC (we are actually
>>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before
>>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
>>>>
>>>> Though not too harmful, i see this manifesting in the GDB testsuite
>>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
>>>> think it is further in the instruction stream than it really is. In 
>>>> fact, we
>>>> get lucky here and no FAIL's show up, only many more spurious PASSes.
>>>
>>> I managed to reproduce this locally and I think I've figured out what's
>>> going on, although I'm not sure that the kernel is the best place to fix
>>> it.
>>>
>>> Looking at the specific reproducer:
>>>
>>>> Execute gdb like so:
>>>>
>>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex 
>>>> "record" -ex
>>>> "si" -ex "rsi" -ex "record stop" insn-reverse
>>>
>>> So we've got a couple of instructions as follows (it doesn't actually 
>>> matter
>>> what they are, so I've changed the LD1 in your binary for a NOP in 
>>> order to
>>> avoid confusion with the "load" label not actually pointing at a load):
>>>
>>>     0x7b8:        mov    // "load"
>>>     0x7bc:        nop
>>>
>>> "b load" places a breakpoint at 0x7b8:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> We run to a software breakpoint on "load" (the mov instruction). We take
>>> the trap and try to execute the "si", which means we need to remove the
>>> breakpoint while we step over it:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>     [...]
>>>     ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0
>>>
>>> This causes the kernel to arm the single-step state machine so that
>>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). 
>>> Running
>>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
>>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the 
>>> trap to
>>> trigger, at which point gdb puts the breakpoint instruction back 
>>> since the
>>> step is complete:
>>
>> So, just to confirm my understanding, we have a couple bits 
>> controlling single-stepping in the kernel, one in MDSCR_EL1 and 
>> another in SPSR_EL1. GDB doesn't have direct access to any of those, 
>> correct?
>>
>> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.
>>
>> The transition from active-not-pending to active-pending takes place 
>> via a single PTRACE_SINGLESTEP request? Is that correct?
>>
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> This is where things start to go wrong. The "rsi" command attempts to
>>> perform a reverse step, which means restoring the old state when we were
>>> previously executing at 0x7b8. It starts by removing the breakpoint 
>>> again,
>>> since we've already hit that:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>
>>> and then resets the CPU registers to their old values:
>>>
>>>     (I don't know why it does this three times)
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>
>>> The problem with this is that we have moved the PC back to 0x7b8 but 
>>> we have
>>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen 
>>> stepping
>>> get disabled (this usually happens by PTRACE_CONT calling
>>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set 
>>> to 1
>>> and we're in the active-pending state! Consequently, we immediately 
>>> take a
>>> step exception if a step operation is attempted >
>>
>> While trying to reproduce this, i was paying attention to the SS bit 
>> coming and going. But in the particular sequence of si/rsi, within the 
>> record boundaries, i see GDB just restored the original CPSR value to 
>> what it was before we processed the si command.
>>
>>  From GDB's POV all state was restore to the way it was before and 
>> we're good to go.
>>
>> Is this not enough to restore state kernel-wise?
>>
>>> Now, we *could* consider hacking the TIF_SINGLESTEP check in
>>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is 
>>> active
>>> but this is a user-visible change and may break things like stepping 
>>> out of
>>> signal handlers. I would prefer that GDB manages the SS bit 
>>> explicitly in
>>> this scenario, by setting it to 1 when restoring the old state in the
>>> reverse step, a bit like when it disables the old breakpoint. You can
>>> emulate this by doing:
>>
>> I think we could let GDB control this when required, but I'm trying to 
>> understand the ramifications of letting GDB do so.
>>
>> For example, what if the user decides to alter the PC here and there, 
>> for debugging purposes. That is a use case that happens often, in 
>> order to go back or skip some parts of the code.
>>
>> Would we need to pay attention to the SS bit in those cases as well?
>>
>>>
>>>     (gdb) set $cpsr |= (1<<21)
>>
>> In particular, what does the switching of this bit accomplishes in the 
>> kernel? Would we be better off forcing the SS bit every time we do a 
>> single-step operation, for example?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-11-26 16:35     ` Luis Machado
  2019-12-10 20:00       ` Luis Machado
@ 2020-01-13 18:13       ` Luis Machado
  1 sibling, 0 replies; 17+ messages in thread
From: Luis Machado @ 2020-01-13 18:13 UTC (permalink / raw)
  Cc: mark.rutland, linux-arm-kernel

Ping?

On 11/26/19 1:35 PM, Luis Machado wrote:
> ping?
> 
> On 11/18/19 11:54 AM, Luis Machado wrote:
>> Hi Will,
>>
>> Thanks for the thorough explanation.
>>
>> On 11/18/19 10:15 AM, Will Deacon wrote:
>>> Hi Luis,
>>>
>>> [+Mark for the valid_user_regs() part]
>>>
>>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
>>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>>>> request by GDB won't execute the underlying instruction. As a 
>>>> consequence,
>>>> the PC doesn't move, but we return a SIGTRAP just like we would for a
>>>> regular successful PTRACE_SINGLESTEP request.
>>>>
>>>> Since there are no software breakpoints inserted at PC (we are actually
>>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before
>>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
>>>>
>>>> Though not too harmful, i see this manifesting in the GDB testsuite
>>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
>>>> think it is further in the instruction stream than it really is. In 
>>>> fact, we
>>>> get lucky here and no FAIL's show up, only many more spurious PASSes.
>>>
>>> I managed to reproduce this locally and I think I've figured out what's
>>> going on, although I'm not sure that the kernel is the best place to fix
>>> it.
>>>
>>> Looking at the specific reproducer:
>>>
>>>> Execute gdb like so:
>>>>
>>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex 
>>>> "record" -ex
>>>> "si" -ex "rsi" -ex "record stop" insn-reverse
>>>
>>> So we've got a couple of instructions as follows (it doesn't actually 
>>> matter
>>> what they are, so I've changed the LD1 in your binary for a NOP in 
>>> order to
>>> avoid confusion with the "load" label not actually pointing at a load):
>>>
>>>     0x7b8:        mov    // "load"
>>>     0x7bc:        nop
>>>
>>> "b load" places a breakpoint at 0x7b8:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> We run to a software breakpoint on "load" (the mov instruction). We take
>>> the trap and try to execute the "si", which means we need to remove the
>>> breakpoint while we step over it:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>     [...]
>>>     ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0
>>>
>>> This causes the kernel to arm the single-step state machine so that
>>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). 
>>> Running
>>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
>>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the 
>>> trap to
>>> trigger, at which point gdb puts the breakpoint instruction back 
>>> since the
>>> step is complete:
>>
>> So, just to confirm my understanding, we have a couple bits 
>> controlling single-stepping in the kernel, one in MDSCR_EL1 and 
>> another in SPSR_EL1. GDB doesn't have direct access to any of those, 
>> correct?
>>
>> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.
>>
>> The transition from active-not-pending to active-pending takes place 
>> via a single PTRACE_SINGLESTEP request? Is that correct?
>>
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> This is where things start to go wrong. The "rsi" command attempts to
>>> perform a reverse step, which means restoring the old state when we were
>>> previously executing at 0x7b8. It starts by removing the breakpoint 
>>> again,
>>> since we've already hit that:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>
>>> and then resets the CPU registers to their old values:
>>>
>>>     (I don't know why it does this three times)
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>
>>> The problem with this is that we have moved the PC back to 0x7b8 but 
>>> we have
>>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen 
>>> stepping
>>> get disabled (this usually happens by PTRACE_CONT calling
>>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set 
>>> to 1
>>> and we're in the active-pending state! Consequently, we immediately 
>>> take a
>>> step exception if a step operation is attempted >
>>
>> While trying to reproduce this, i was paying attention to the SS bit 
>> coming and going. But in the particular sequence of si/rsi, within the 
>> record boundaries, i see GDB just restored the original CPSR value to 
>> what it was before we processed the si command.
>>
>>  From GDB's POV all state was restore to the way it was before and 
>> we're good to go.
>>
>> Is this not enough to restore state kernel-wise?
>>
>>> Now, we *could* consider hacking the TIF_SINGLESTEP check in
>>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is 
>>> active
>>> but this is a user-visible change and may break things like stepping 
>>> out of
>>> signal handlers. I would prefer that GDB manages the SS bit 
>>> explicitly in
>>> this scenario, by setting it to 1 when restoring the old state in the
>>> reverse step, a bit like when it disables the old breakpoint. You can
>>> emulate this by doing:
>>
>> I think we could let GDB control this when required, but I'm trying to 
>> understand the ramifications of letting GDB do so.
>>
>> For example, what if the user decides to alter the PC here and there, 
>> for debugging purposes. That is a use case that happens often, in 
>> order to go back or skip some parts of the code.
>>
>> Would we need to pay attention to the SS bit in those cases as well?
>>
>>>
>>>     (gdb) set $cpsr |= (1<<21)
>>
>> In particular, what does the switching of this bit accomplishes in the 
>> kernel? Would we be better off forcing the SS bit every time we do a 
>> single-step operation, for example?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2019-12-10 20:00       ` Luis Machado
@ 2020-02-13 12:01         ` Will Deacon
  2020-02-13 17:07           ` Luis Machado
  2020-02-20 13:02           ` Mark Rutland
  0 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2020-02-13 12:01 UTC (permalink / raw)
  To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel

Hi Luis,

Sorry for the very slow reply. I talked to Mark about this a bit but it
seems that we never followed up here.

On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
> Do you have any input regarding this particular situation?
> 
> It would be nice to get this fixed before the release of another GDB
> version, if the fix is to live in GDB itself.

Basically, I'm very nervous about fixing this in the kernel because
whatever we do will be visible to userspace. On the other hand, this
part of the ptrace interface is only seriously used by GDB and we should
make sure that it works well.

Does the diff below solve the problem? If so, can you confirm that it
doesn't appear to regress anything else for GDB?

Cheers,

Will

--->8

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index 7619f473155f..d825e3585e28 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el);
 
 void user_rewind_single_step(struct task_struct *task);
 void user_fastforward_single_step(struct task_struct *task);
+void user_regs_reset_single_step(struct user_pt_regs *regs,
+				 struct task_struct *task);
 
 void kernel_enable_single_step(struct pt_regs *regs);
 void kernel_disable_single_step(void);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 48222a4760c2..7569deb1eac1 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init);
 /*
  * Single step API and exception handling.
  */
-static void set_regs_spsr_ss(struct pt_regs *regs)
+static void set_user_regs_spsr_ss(struct user_pt_regs *regs)
 {
 	regs->pstate |= DBG_SPSR_SS;
 }
-NOKPROBE_SYMBOL(set_regs_spsr_ss);
+NOKPROBE_SYMBOL(set_user_regs_spsr_ss);
 
-static void clear_regs_spsr_ss(struct pt_regs *regs)
+static void clear_user_regs_spsr_ss(struct user_pt_regs *regs)
 {
 	regs->pstate &= ~DBG_SPSR_SS;
 }
-NOKPROBE_SYMBOL(clear_regs_spsr_ss);
+NOKPROBE_SYMBOL(clear_user_regs_spsr_ss);
+
+#define set_regs_spsr_ss(r)	set_user_regs_spsr_ss(&(r)->user_regs)
+#define clear_regs_spsr_ss(r)	clear_user_regs_spsr_ss(&(r)->user_regs)
 
 static DEFINE_SPINLOCK(debug_hook_lock);
 static LIST_HEAD(user_step_hook);
@@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task)
 		clear_regs_spsr_ss(task_pt_regs(task));
 }
 
+void user_regs_reset_single_step(struct user_pt_regs *regs,
+				 struct task_struct *task)
+{
+	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
+		set_user_regs_spsr_ss(regs);
+	else
+		clear_user_regs_spsr_ss(regs);
+}
+
 /* Kernel API */
 void kernel_enable_single_step(struct pt_regs *regs)
 {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index cd6e5fa48b9c..d479fbcbd0d2 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
  */
 int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
 {
-	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
-		regs->pstate &= ~DBG_SPSR_SS;
+	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
+	user_regs_reset_single_step(regs, task);
 
 	if (is_compat_thread(task_thread_info(task)))
 		return valid_compat_regs(regs);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 339882db5a91..bc54bdbfd760 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
 	forget_syscall(regs);
 
 	err |= !valid_user_regs(&regs->user_regs, current);
-	if (err == 0)
+
+	if (err == 0) {
+		/* Make it look like we stepped the sigreturn system call */
+		user_fastforward_single_step(current);
 		err = parse_user_sigframe(&user, sf);
+	}
 
 	if (err == 0 && system_supports_fpsimd()) {
 		if (!user.fpsimd)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-13 12:01         ` Will Deacon
@ 2020-02-13 17:07           ` Luis Machado
  2020-02-14 15:45             ` Luis Machado
  2020-02-20 13:02           ` Mark Rutland
  1 sibling, 1 reply; 17+ messages in thread
From: Luis Machado @ 2020-02-13 17:07 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

Hi Will,

On 2/13/20 9:01 AM, Will Deacon wrote:
> Hi Luis,
> 
> Sorry for the very slow reply. I talked to Mark about this a bit but it
> seems that we never followed up here.

No worries.

> 
> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
>> Do you have any input regarding this particular situation?
>>
>> It would be nice to get this fixed before the release of another GDB
>> version, if the fix is to live in GDB itself.
> 
> Basically, I'm very nervous about fixing this in the kernel because
> whatever we do will be visible to userspace. On the other hand, this
> part of the ptrace interface is only seriously used by GDB and we should
> make sure that it works well.
> 
> Does the diff below solve the problem? If so, can you confirm that it
> doesn't appear to regress anything else for GDB?

Thanks for the patch. I'll exercise this in various ways to see if 
anything breaks.

> 
> Cheers,
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
> index 7619f473155f..d825e3585e28 100644
> --- a/arch/arm64/include/asm/debug-monitors.h
> +++ b/arch/arm64/include/asm/debug-monitors.h
> @@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el);
>   
>   void user_rewind_single_step(struct task_struct *task);
>   void user_fastforward_single_step(struct task_struct *task);
> +void user_regs_reset_single_step(struct user_pt_regs *regs,
> +				 struct task_struct *task);
>   
>   void kernel_enable_single_step(struct pt_regs *regs);
>   void kernel_disable_single_step(void);
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 48222a4760c2..7569deb1eac1 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init);
>   /*
>    * Single step API and exception handling.
>    */
> -static void set_regs_spsr_ss(struct pt_regs *regs)
> +static void set_user_regs_spsr_ss(struct user_pt_regs *regs)
>   {
>   	regs->pstate |= DBG_SPSR_SS;
>   }
> -NOKPROBE_SYMBOL(set_regs_spsr_ss);
> +NOKPROBE_SYMBOL(set_user_regs_spsr_ss);
>   
> -static void clear_regs_spsr_ss(struct pt_regs *regs)
> +static void clear_user_regs_spsr_ss(struct user_pt_regs *regs)
>   {
>   	regs->pstate &= ~DBG_SPSR_SS;
>   }
> -NOKPROBE_SYMBOL(clear_regs_spsr_ss);
> +NOKPROBE_SYMBOL(clear_user_regs_spsr_ss);
> +
> +#define set_regs_spsr_ss(r)	set_user_regs_spsr_ss(&(r)->user_regs)
> +#define clear_regs_spsr_ss(r)	clear_user_regs_spsr_ss(&(r)->user_regs)
>   
>   static DEFINE_SPINLOCK(debug_hook_lock);
>   static LIST_HEAD(user_step_hook);
> @@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task)
>   		clear_regs_spsr_ss(task_pt_regs(task));
>   }
>   
> +void user_regs_reset_single_step(struct user_pt_regs *regs,
> +				 struct task_struct *task)
> +{
> +	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
> +		set_user_regs_spsr_ss(regs);
> +	else
> +		clear_user_regs_spsr_ss(regs);
> +}
> +
>   /* Kernel API */
>   void kernel_enable_single_step(struct pt_regs *regs)
>   {
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index cd6e5fa48b9c..d479fbcbd0d2 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
>    */
>   int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
>   {
> -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
> -		regs->pstate &= ~DBG_SPSR_SS;
> +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> +	user_regs_reset_single_step(regs, task);
>   
>   	if (is_compat_thread(task_thread_info(task)))
>   		return valid_compat_regs(regs);
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 339882db5a91..bc54bdbfd760 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
>   	forget_syscall(regs);
>   
>   	err |= !valid_user_regs(&regs->user_regs, current);
> -	if (err == 0)
> +
> +	if (err == 0) {
> +		/* Make it look like we stepped the sigreturn system call */
> +		user_fastforward_single_step(current);
>   		err = parse_user_sigframe(&user, sf);
> +	}
>   
>   	if (err == 0 && system_supports_fpsimd()) {
>   		if (!user.fpsimd)
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-13 17:07           ` Luis Machado
@ 2020-02-14 15:45             ` Luis Machado
  2020-02-18  8:44               ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Machado @ 2020-02-14 15:45 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

Will,

On 2/13/20 2:07 PM, Luis Machado wrote:
> Hi Will,
> 
> On 2/13/20 9:01 AM, Will Deacon wrote:
>> Hi Luis,
>>
>> Sorry for the very slow reply. I talked to Mark about this a bit but it
>> seems that we never followed up here.
> 
> No worries.
> 
>>
>> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
>>> Do you have any input regarding this particular situation?
>>>
>>> It would be nice to get this fixed before the release of another GDB
>>> version, if the fix is to live in GDB itself.
>>
>> Basically, I'm very nervous about fixing this in the kernel because
>> whatever we do will be visible to userspace. On the other hand, this
>> part of the ptrace interface is only seriously used by GDB and we should
>> make sure that it works well.
>>
>> Does the diff below solve the problem? If so, can you confirm that it
>> doesn't appear to regress anything else for GDB?
> 
> Thanks for the patch. I'll exercise this in various ways to see if 
> anything breaks.
> 

I gave this a try with the particular test in GDB's testsuite that 
exposed the problem. It is working as expected now, so we're 
single-stepping past the instruction correctly instead of getting a 
spurious SIGTRAP.

I managed to run a few other tests related to syscalls and signals and 
they also executed as expected. But this was inside QEMU.

Do you see any potential scenarios where this change may break things? 
Other things i should try to exercise?

Given we need to be careful with a kernel patch at this stage, i just 
want to make sure I covered all/most the possible cases.

Otherwise, I'm happy with this change. Thanks for putting it together!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-14 15:45             ` Luis Machado
@ 2020-02-18  8:44               ` Will Deacon
  2020-02-18 10:33                 ` Luis Machado
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-02-18  8:44 UTC (permalink / raw)
  To: Luis Machado; +Cc: mark.rutland, linux-arm-kernel

On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote:
> On 2/13/20 2:07 PM, Luis Machado wrote:
> > On 2/13/20 9:01 AM, Will Deacon wrote:
> > > Sorry for the very slow reply. I talked to Mark about this a bit but it
> > > seems that we never followed up here.
> > 
> > No worries.
> > 
> > > 
> > > On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
> > > > Do you have any input regarding this particular situation?
> > > > 
> > > > It would be nice to get this fixed before the release of another GDB
> > > > version, if the fix is to live in GDB itself.
> > > 
> > > Basically, I'm very nervous about fixing this in the kernel because
> > > whatever we do will be visible to userspace. On the other hand, this
> > > part of the ptrace interface is only seriously used by GDB and we should
> > > make sure that it works well.
> > > 
> > > Does the diff below solve the problem? If so, can you confirm that it
> > > doesn't appear to regress anything else for GDB?
> > 
> > Thanks for the patch. I'll exercise this in various ways to see if
> > anything breaks.
> > 
> 
> I gave this a try with the particular test in GDB's testsuite that exposed
> the problem. It is working as expected now, so we're single-stepping past
> the instruction correctly instead of getting a spurious SIGTRAP.
> 
> I managed to run a few other tests related to syscalls and signals and they
> also executed as expected. But this was inside QEMU.
> 
> Do you see any potential scenarios where this change may break things? Other
> things i should try to exercise?

Could you run the entire testsuite please and check there aren't any
regressions? Hardware would be best, but QEMU is still useful.

> Given we need to be careful with a kernel patch at this stage, i just want
> to make sure I covered all/most the possible cases.
> 
> Otherwise, I'm happy with this change. Thanks for putting it together!

I'll add your Tested-by, but I'd still like review from Mark.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-18  8:44               ` Will Deacon
@ 2020-02-18 10:33                 ` Luis Machado
  2020-02-26 13:01                   ` Luis Machado
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Machado @ 2020-02-18 10:33 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

On 2/18/20 5:44 AM, Will Deacon wrote:
> On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote:
>> On 2/13/20 2:07 PM, Luis Machado wrote:
>>> On 2/13/20 9:01 AM, Will Deacon wrote:
>>>> Sorry for the very slow reply. I talked to Mark about this a bit but it
>>>> seems that we never followed up here.
>>>
>>> No worries.
>>>
>>>>
>>>> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
>>>>> Do you have any input regarding this particular situation?
>>>>>
>>>>> It would be nice to get this fixed before the release of another GDB
>>>>> version, if the fix is to live in GDB itself.
>>>>
>>>> Basically, I'm very nervous about fixing this in the kernel because
>>>> whatever we do will be visible to userspace. On the other hand, this
>>>> part of the ptrace interface is only seriously used by GDB and we should
>>>> make sure that it works well.
>>>>
>>>> Does the diff below solve the problem? If so, can you confirm that it
>>>> doesn't appear to regress anything else for GDB?
>>>
>>> Thanks for the patch. I'll exercise this in various ways to see if
>>> anything breaks.
>>>
>>
>> I gave this a try with the particular test in GDB's testsuite that exposed
>> the problem. It is working as expected now, so we're single-stepping past
>> the instruction correctly instead of getting a spurious SIGTRAP.
>>
>> I managed to run a few other tests related to syscalls and signals and they
>> also executed as expected. But this was inside QEMU.
>>
>> Do you see any potential scenarios where this change may break things? Other
>> things i should try to exercise?
> 
> Could you run the entire testsuite please and check there aren't any
> regressions? Hardware would be best, but QEMU is still useful.
> 

I'll try to get a hold of hardware to do this. QEMU will be too slow and 
we'll likely see some failures due to running things in QEMU as well.

I'll let you know.

>> Given we need to be careful with a kernel patch at this stage, i just want
>> to make sure I covered all/most the possible cases.
>>
>> Otherwise, I'm happy with this change. Thanks for putting it together!
> 
> I'll add your Tested-by, but I'd still like review from Mark.
> 
> Will
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-13 12:01         ` Will Deacon
  2020-02-13 17:07           ` Luis Machado
@ 2020-02-20 13:02           ` Mark Rutland
  2020-02-20 13:29             ` Will Deacon
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2020-02-20 13:02 UTC (permalink / raw)
  To: Will Deacon; +Cc: Luis Machado, linux-arm-kernel

Hi Will, Luis,

On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote:
> Sorry for the very slow reply. I talked to Mark about this a bit but it
> seems that we never followed up here.
> 
> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
> > Do you have any input regarding this particular situation?
> > 
> > It would be nice to get this fixed before the release of another GDB
> > version, if the fix is to live in GDB itself.
> 
> Basically, I'm very nervous about fixing this in the kernel because
> whatever we do will be visible to userspace. On the other hand, this
> part of the ptrace interface is only seriously used by GDB and we should
> make sure that it works well.
> 
> Does the diff below solve the problem? If so, can you confirm that it
> doesn't appear to regress anything else for GDB?
> 
> Cheers,
> 
> Will

> 
> --->8
> 
> diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
> index 7619f473155f..d825e3585e28 100644
> --- a/arch/arm64/include/asm/debug-monitors.h
> +++ b/arch/arm64/include/asm/debug-monitors.h
> @@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el);
>  
>  void user_rewind_single_step(struct task_struct *task);
>  void user_fastforward_single_step(struct task_struct *task);
> +void user_regs_reset_single_step(struct user_pt_regs *regs,
> +				 struct task_struct *task);
>  
>  void kernel_enable_single_step(struct pt_regs *regs);
>  void kernel_disable_single_step(void);
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 48222a4760c2..7569deb1eac1 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init);
>  /*
>   * Single step API and exception handling.
>   */
> -static void set_regs_spsr_ss(struct pt_regs *regs)
> +static void set_user_regs_spsr_ss(struct user_pt_regs *regs)
>  {
>  	regs->pstate |= DBG_SPSR_SS;
>  }
> -NOKPROBE_SYMBOL(set_regs_spsr_ss);
> +NOKPROBE_SYMBOL(set_user_regs_spsr_ss);
>  
> -static void clear_regs_spsr_ss(struct pt_regs *regs)
> +static void clear_user_regs_spsr_ss(struct user_pt_regs *regs)
>  {
>  	regs->pstate &= ~DBG_SPSR_SS;
>  }
> -NOKPROBE_SYMBOL(clear_regs_spsr_ss);
> +NOKPROBE_SYMBOL(clear_user_regs_spsr_ss);
> +
> +#define set_regs_spsr_ss(r)	set_user_regs_spsr_ss(&(r)->user_regs)
> +#define clear_regs_spsr_ss(r)	clear_user_regs_spsr_ss(&(r)->user_regs)
>  
>  static DEFINE_SPINLOCK(debug_hook_lock);
>  static LIST_HEAD(user_step_hook);
> @@ -404,6 +407,15 @@ void user_fastforward_single_step(struct task_struct *task)
>  		clear_regs_spsr_ss(task_pt_regs(task));
>  }
>  
> +void user_regs_reset_single_step(struct user_pt_regs *regs,
> +				 struct task_struct *task)
> +{
> +	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
> +		set_user_regs_spsr_ss(regs);
> +	else
> +		clear_user_regs_spsr_ss(regs);
> +}
> +
>  /* Kernel API */
>  void kernel_enable_single_step(struct pt_regs *regs)
>  {
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index cd6e5fa48b9c..d479fbcbd0d2 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
>   */
>  int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
>  {
> -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
> -		regs->pstate &= ~DBG_SPSR_SS;
> +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> +	user_regs_reset_single_step(regs, task);

I think this change means we do the right thing for signal entry/return
and ptrace messing with the regs. Instruction emulation seems to do the
right thing via skip_faulting_instruction().

I think there are a few more single-step edge cases lying around (e.g.
uprobes, rseq), but it looks like those have to be fixed separately. I
fear fixing uprobes might require a largler structural change to single
step, but ignoring uprobes the changes above seem to be sound.

If userspace doesn't consume the SS value today, I wonder if we should
hide it when dumping the SPSR to userspace, so that userspace has a
consistent view regardless of whether it's being stepped.

I'll try to dig into the uprobes stuff this afternoon, just in case that
needs us to do something substantially different.

The existing logic in valid_user_regs() doesn't make sense to me, given
SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that
was overzealous or I've forgotten an edge case that we cared about in
the past.

>  
>  	if (is_compat_thread(task_thread_info(task)))
>  		return valid_compat_regs(regs);
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 339882db5a91..bc54bdbfd760 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
>  	forget_syscall(regs);
>  
>  	err |= !valid_user_regs(&regs->user_regs, current);
> -	if (err == 0)
> +
> +	if (err == 0) {
> +		/* Make it look like we stepped the sigreturn system call */
> +		user_fastforward_single_step(current);
>  		err = parse_user_sigframe(&user, sf);
> +	}

I don't understand this. AFAICT  we don't likewise for other SVCs, so
either I'm missing that, or there's something else I'm missing.

Why do we need to step sigreturn but not SVC generally?

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-20 13:02           ` Mark Rutland
@ 2020-02-20 13:29             ` Will Deacon
  2020-02-21 11:16               ` Mark Rutland
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-02-20 13:29 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Luis Machado, linux-arm-kernel

Hi Mark,

Thanks for having a look.

On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote:
> On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote:
> > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > index cd6e5fa48b9c..d479fbcbd0d2 100644
> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
> >   */
> >  int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
> >  {
> > -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
> > -		regs->pstate &= ~DBG_SPSR_SS;
> > +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> > +	user_regs_reset_single_step(regs, task);
> 
> I think this change means we do the right thing for signal entry/return
> and ptrace messing with the regs. Instruction emulation seems to do the
> right thing via skip_faulting_instruction().
> 
> I think there are a few more single-step edge cases lying around (e.g.
> uprobes, rseq), but it looks like those have to be fixed separately. I
> fear fixing uprobes might require a largler structural change to single
> step, but ignoring uprobes the changes above seem to be sound.

Rseq should just abort when delivering the step signal and I'm not sure I
see the issue with uprobes. Can you elaborate on your concerns a bit,
please?

> If userspace doesn't consume the SS value today, I wonder if we should
> hide it when dumping the SPSR to userspace, so that userspace has a
> consistent view regardless of whether it's being stepped.

You can't really hide it though, because '0' has a meaning so I don't think
it gains us a lot other than increasing the scope of the change.

> I'll try to dig into the uprobes stuff this afternoon, just in case that
> needs us to do something substantially different.

Thanks.

> The existing logic in valid_user_regs() doesn't make sense to me, given
> SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that
> was overzealous or I've forgotten an edge case that we cared about in
> the past.

I think it was just part of sanitising the registers to a consistent value,
but I agree that it wouldn't have a functional impact.

> > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> > index 339882db5a91..bc54bdbfd760 100644
> > --- a/arch/arm64/kernel/signal.c
> > +++ b/arch/arm64/kernel/signal.c
> > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
> >  	forget_syscall(regs);
> >  
> >  	err |= !valid_user_regs(&regs->user_regs, current);
> > -	if (err == 0)
> > +
> > +	if (err == 0) {
> > +		/* Make it look like we stepped the sigreturn system call */
> > +		user_fastforward_single_step(current);
> >  		err = parse_user_sigframe(&user, sf);
> > +	}
> 
> I don't understand this. AFAICT  we don't likewise for other SVCs, so
> either I'm missing that, or there's something else I'm missing.
> 
> Why do we need to step sigreturn but not SVC generally?

Because we restore the SPSR from the sigframe during sigreturn, so we will
end up with PSTATE.SS set when it should be cleared.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-20 13:29             ` Will Deacon
@ 2020-02-21 11:16               ` Mark Rutland
  2020-05-27 14:39                 ` Luis Machado
  2020-05-31  9:52                 ` Will Deacon
  0 siblings, 2 replies; 17+ messages in thread
From: Mark Rutland @ 2020-02-21 11:16 UTC (permalink / raw)
  To: Will Deacon; +Cc: Luis Machado, linux-arm-kernel

On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote:
> Hi Mark,
> 
> Thanks for having a look.
> 
> On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote:
> > On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote:
> > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > > index cd6e5fa48b9c..d479fbcbd0d2 100644
> > > --- a/arch/arm64/kernel/ptrace.c
> > > +++ b/arch/arm64/kernel/ptrace.c
> > > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
> > >   */
> > >  int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
> > >  {
> > > -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
> > > -		regs->pstate &= ~DBG_SPSR_SS;
> > > +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> > > +	user_regs_reset_single_step(regs, task);
> > 
> > I think this change means we do the right thing for signal entry/return
> > and ptrace messing with the regs. Instruction emulation seems to do the
> > right thing via skip_faulting_instruction().
> > 
> > I think there are a few more single-step edge cases lying around (e.g.
> > uprobes, rseq), but it looks like those have to be fixed separately. I
> > fear fixing uprobes might require a largler structural change to single
> > step, but ignoring uprobes the changes above seem to be sound.
> 
> Rseq should just abort when delivering the step signal and I'm not sure I
> see the issue with uprobes. Can you elaborate on your concerns a bit,
> please?

For rseq I wasn't sure what state PSTATE.SS should be when we head to
the abort handler -- I think the sensible thing would be that it
immediately triggers a single-step exception, but I don't see where we'd
clear PSTATE.SS to ensure that.

For uprobes I fear that the uprobes xol single-stepping might end up
conflicting with the regular ptrace single-stepping, and that the
uprobes emulation might not always advance the state machine correctly.

> > If userspace doesn't consume the SS value today, I wonder if we should
> > hide it when dumping the SPSR to userspace, so that userspace has a
> > consistent view regardless of whether it's being stepped.
> 
> You can't really hide it though, because '0' has a meaning so I don't think
> it gains us a lot other than increasing the scope of the change.

I think that it reduces the likelihood that single-stepping a program
changes its behaviour unexpectedly. This patch makes the kernel
disregard the PSTATE.SS value provided by userspace, so what is gained
by exposing PSTATE.SS to userspace at all?

I do agree that there are potentially subtle landmines here; I just
can't see a legitimate reason for userspace to need the value.

> > I'll try to dig into the uprobes stuff this afternoon, just in case
> > that
> > needs us to do something substantially different.
> 
> Thanks.

I didn't get the chance to do this yesterday, but I did think of another
potential problem.

I *think* that when attempting to single-step a syscall, if prior to
return from the syscall the tracer messed with the tracee's regs (e.g.
to mess with arguments or the retun value) then valid_user_regs() will
set the SS bit, and upon return from the syscall the next instruction
would be executed rather than first raising a single-step exception.

This patch relies on valid_user_regs() being a signal that PSTATE.SS is
stale, but that's not always the case. To handle that generally I
suspect we need two bits of state rather than just TIF_SINGLESTEP.

> > The existing logic in valid_user_regs() doesn't make sense to me, given
> > SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that
> > was overzealous or I've forgotten an edge case that we cared about in
> > the past.
> 
> I think it was just part of sanitising the registers to a consistent value,
> but I agree that it wouldn't have a functional impact.

Thanks for confirming my understanding. I guess this may have minimized
the cases where userspace saw PSTATE.SS set.

> > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> > > index 339882db5a91..bc54bdbfd760 100644
> > > --- a/arch/arm64/kernel/signal.c
> > > +++ b/arch/arm64/kernel/signal.c
> > > @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
> > >  	forget_syscall(regs);
> > >  
> > >  	err |= !valid_user_regs(&regs->user_regs, current);
> > > -	if (err == 0)
> > > +
> > > +	if (err == 0) {
> > > +		/* Make it look like we stepped the sigreturn system call */
> > > +		user_fastforward_single_step(current);
> > >  		err = parse_user_sigframe(&user, sf);
> > > +	}
> > 
> > I don't understand this. AFAICT  we don't likewise for other SVCs, so
> > either I'm missing that, or there's something else I'm missing.
> > 
> > Why do we need to step sigreturn but not SVC generally?
> 
> Because we restore the SPSR from the sigframe during sigreturn, so we will
> end up with PSTATE.SS set when it should be cleared.

Ah, I see. As above, I think we can hit a similar case when
single-stepping an SVC for a regular syscall.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-18 10:33                 ` Luis Machado
@ 2020-02-26 13:01                   ` Luis Machado
  0 siblings, 0 replies; 17+ messages in thread
From: Luis Machado @ 2020-02-26 13:01 UTC (permalink / raw)
  To: Will Deacon; +Cc: mark.rutland, linux-arm-kernel

Hi,

On 2/18/20 7:33 AM, Luis Machado wrote:
> On 2/18/20 5:44 AM, Will Deacon wrote:
>> On Fri, Feb 14, 2020 at 12:45:31PM -0300, Luis Machado wrote:
>>> On 2/13/20 2:07 PM, Luis Machado wrote:
>>>> On 2/13/20 9:01 AM, Will Deacon wrote:
>>>>> Sorry for the very slow reply. I talked to Mark about this a bit 
>>>>> but it
>>>>> seems that we never followed up here.
>>>>
>>>> No worries.
>>>>
>>>>>
>>>>> On Tue, Dec 10, 2019 at 05:00:18PM -0300, Luis Machado wrote:
>>>>>> Do you have any input regarding this particular situation?
>>>>>>
>>>>>> It would be nice to get this fixed before the release of another GDB
>>>>>> version, if the fix is to live in GDB itself.
>>>>>
>>>>> Basically, I'm very nervous about fixing this in the kernel because
>>>>> whatever we do will be visible to userspace. On the other hand, this
>>>>> part of the ptrace interface is only seriously used by GDB and we 
>>>>> should
>>>>> make sure that it works well.
>>>>>
>>>>> Does the diff below solve the problem? If so, can you confirm that it
>>>>> doesn't appear to regress anything else for GDB?
>>>>
>>>> Thanks for the patch. I'll exercise this in various ways to see if
>>>> anything breaks.
>>>>
>>>
>>> I gave this a try with the particular test in GDB's testsuite that 
>>> exposed
>>> the problem. It is working as expected now, so we're single-stepping 
>>> past
>>> the instruction correctly instead of getting a spurious SIGTRAP.
>>>
>>> I managed to run a few other tests related to syscalls and signals 
>>> and they
>>> also executed as expected. But this was inside QEMU.
>>>
>>> Do you see any potential scenarios where this change may break 
>>> things? Other
>>> things i should try to exercise?
>>
>> Could you run the entire testsuite please and check there aren't any
>> regressions? Hardware would be best, but QEMU is still useful.
>>
> 
> I'll try to get a hold of hardware to do this. QEMU will be too slow and 
> we'll likely see some failures due to running things in QEMU as well.
> 
> I'll let you know.

So i managed to do a complete GDB testsuite run inside a system mode 
QEMU, with both the patched and unpatched kernel.

I did not see any regressions. I also noticed the particular testcase 
where we were having the single-stepping hiccup is running as it should now.

So, from GDB's perspective, this patch looks good.

Let me know if there are any corner cases i should exercise (maybe by hand).

Thanks,
Luis

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-21 11:16               ` Mark Rutland
@ 2020-05-27 14:39                 ` Luis Machado
  2020-05-31  9:52                 ` Will Deacon
  1 sibling, 0 replies; 17+ messages in thread
From: Luis Machado @ 2020-05-27 14:39 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-arm-kernel

Hi,

On 2/21/20 8:16 AM, Mark Rutland wrote:
> On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote:
>> Hi Mark,
>>
>> Thanks for having a look.
>>
>> On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote:
>>> On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote:
>>>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
>>>> index cd6e5fa48b9c..d479fbcbd0d2 100644
>>>> --- a/arch/arm64/kernel/ptrace.c
>>>> +++ b/arch/arm64/kernel/ptrace.c
>>>> @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
>>>>    */
>>>>   int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
>>>>   {
>>>> -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
>>>> -		regs->pstate &= ~DBG_SPSR_SS;
>>>> +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
>>>> +	user_regs_reset_single_step(regs, task);
>>>
>>> I think this change means we do the right thing for signal entry/return
>>> and ptrace messing with the regs. Instruction emulation seems to do the
>>> right thing via skip_faulting_instruction().
>>>
>>> I think there are a few more single-step edge cases lying around (e.g.
>>> uprobes, rseq), but it looks like those have to be fixed separately. I
>>> fear fixing uprobes might require a largler structural change to single
>>> step, but ignoring uprobes the changes above seem to be sound.
>>
>> Rseq should just abort when delivering the step signal and I'm not sure I
>> see the issue with uprobes. Can you elaborate on your concerns a bit,
>> please?
> 
> For rseq I wasn't sure what state PSTATE.SS should be when we head to
> the abort handler -- I think the sensible thing would be that it
> immediately triggers a single-step exception, but I don't see where we'd
> clear PSTATE.SS to ensure that.
> 
> For uprobes I fear that the uprobes xol single-stepping might end up
> conflicting with the regular ptrace single-stepping, and that the
> uprobes emulation might not always advance the state machine correctly.
> 
>>> If userspace doesn't consume the SS value today, I wonder if we should
>>> hide it when dumping the SPSR to userspace, so that userspace has a
>>> consistent view regardless of whether it's being stepped.
>>
>> You can't really hide it though, because '0' has a meaning so I don't think
>> it gains us a lot other than increasing the scope of the change.
> 
> I think that it reduces the likelihood that single-stepping a program
> changes its behaviour unexpectedly. This patch makes the kernel
> disregard the PSTATE.SS value provided by userspace, so what is gained
> by exposing PSTATE.SS to userspace at all?
> 
> I do agree that there are potentially subtle landmines here; I just
> can't see a legitimate reason for userspace to need the value.
> 
>>> I'll try to dig into the uprobes stuff this afternoon, just in case
>>> that
>>> needs us to do something substantially different.
>>
>> Thanks.
> 
> I didn't get the chance to do this yesterday, but I did think of another
> potential problem.
> 
> I *think* that when attempting to single-step a syscall, if prior to
> return from the syscall the tracer messed with the tracee's regs (e.g.
> to mess with arguments or the retun value) then valid_user_regs() will
> set the SS bit, and upon return from the syscall the next instruction
> would be executed rather than first raising a single-step exception.
> 
> This patch relies on valid_user_regs() being a signal that PSTATE.SS is
> stale, but that's not always the case. To handle that generally I
> suspect we need two bits of state rather than just TIF_SINGLESTEP.
> 
>>> The existing logic in valid_user_regs() doesn't make sense to me, given
>>> SPSR_EL1.SS is immaterial unless MSCDR_EL1.SS == 1. I'm not sure if that
>>> was overzealous or I've forgotten an edge case that we cared about in
>>> the past.
>>
>> I think it was just part of sanitising the registers to a consistent value,
>> but I agree that it wouldn't have a functional impact.
> 
> Thanks for confirming my understanding. I guess this may have minimized
> the cases where userspace saw PSTATE.SS set.
> 
>>>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>>>> index 339882db5a91..bc54bdbfd760 100644
>>>> --- a/arch/arm64/kernel/signal.c
>>>> +++ b/arch/arm64/kernel/signal.c
>>>> @@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
>>>>   	forget_syscall(regs);
>>>>   
>>>>   	err |= !valid_user_regs(&regs->user_regs, current);
>>>> -	if (err == 0)
>>>> +
>>>> +	if (err == 0) {
>>>> +		/* Make it look like we stepped the sigreturn system call */
>>>> +		user_fastforward_single_step(current);
>>>>   		err = parse_user_sigframe(&user, sf);
>>>> +	}
>>>
>>> I don't understand this. AFAICT  we don't likewise for other SVCs, so
>>> either I'm missing that, or there's something else I'm missing.
>>>
>>> Why do we need to step sigreturn but not SVC generally?
>>
>> Because we restore the SPSR from the sigframe during sigreturn, so we will
>> end up with PSTATE.SS set when it should be cleared.
> 
> Ah, I see. As above, I think we can hit a similar case when
> single-stepping an SVC for a regular syscall.
> 
> Thanks,
> Mark.
> 

Did we have any further developments on this front? Has a patch made its 
way upstream for review?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction
  2020-02-21 11:16               ` Mark Rutland
  2020-05-27 14:39                 ` Luis Machado
@ 2020-05-31  9:52                 ` Will Deacon
  1 sibling, 0 replies; 17+ messages in thread
From: Will Deacon @ 2020-05-31  9:52 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Luis Machado, linux-arm-kernel

Hi folks,

Sorry, I wrote a reply to this on a plane (so you can tell how long ago that
was!) and then forgot about it.

On Fri, Feb 21, 2020 at 11:16:53AM +0000, Mark Rutland wrote:
> On Thu, Feb 20, 2020 at 01:29:42PM +0000, Will Deacon wrote:
> > On Thu, Feb 20, 2020 at 01:02:22PM +0000, Mark Rutland wrote:
> > > On Thu, Feb 13, 2020 at 12:01:16PM +0000, Will Deacon wrote:
> > > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > > > index cd6e5fa48b9c..d479fbcbd0d2 100644
> > > > --- a/arch/arm64/kernel/ptrace.c
> > > > +++ b/arch/arm64/kernel/ptrace.c
> > > > @@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
> > > >   */
> > > >  int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
> > > >  {
> > > > -	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
> > > > -		regs->pstate &= ~DBG_SPSR_SS;
> > > > +	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> > > > +	user_regs_reset_single_step(regs, task);
> > > 
> > > I think this change means we do the right thing for signal entry/return
> > > and ptrace messing with the regs. Instruction emulation seems to do the
> > > right thing via skip_faulting_instruction().
> > > 
> > > I think there are a few more single-step edge cases lying around (e.g.
> > > uprobes, rseq), but it looks like those have to be fixed separately. I
> > > fear fixing uprobes might require a largler structural change to single
> > > step, but ignoring uprobes the changes above seem to be sound.
> > 
> > Rseq should just abort when delivering the step signal and I'm not sure I
> > see the issue with uprobes. Can you elaborate on your concerns a bit,
> > please?
> 
> For rseq I wasn't sure what state PSTATE.SS should be when we head to
> the abort handler -- I think the sensible thing would be that it
> immediately triggers a single-step exception, but I don't see where we'd
> clear PSTATE.SS to ensure that.
> 
> For uprobes I fear that the uprobes xol single-stepping might end up
> conflicting with the regular ptrace single-stepping, and that the
> uprobes emulation might not always advance the state machine correctly.
> 
> > > If userspace doesn't consume the SS value today, I wonder if we should
> > > hide it when dumping the SPSR to userspace, so that userspace has a
> > > consistent view regardless of whether it's being stepped.
> > 
> > You can't really hide it though, because '0' has a meaning so I don't think
> > it gains us a lot other than increasing the scope of the change.
> 
> I think that it reduces the likelihood that single-stepping a program
> changes its behaviour unexpectedly. This patch makes the kernel
> disregard the PSTATE.SS value provided by userspace, so what is gained
> by exposing PSTATE.SS to userspace at all?
> 
> I do agree that there are potentially subtle landmines here; I just
> can't see a legitimate reason for userspace to need the value.
> 
> > > I'll try to dig into the uprobes stuff this afternoon, just in case
> > > that
> > > needs us to do something substantially different.
> > 
> > Thanks.
> 
> I didn't get the chance to do this yesterday, but I did think of another
> potential problem.
> 
> I *think* that when attempting to single-step a syscall, if prior to
> return from the syscall the tracer messed with the tracee's regs (e.g.
> to mess with arguments or the retun value) then valid_user_regs() will
> set the SS bit, and upon return from the syscall the next instruction
> would be executed rather than first raising a single-step exception.

I don't actually think that's a problem: if the tracer has taken control by
e.g. PTRACE_SYSCALL and modified the registers on the syscall return path,
then it has to resume execution of the tracee somehow. There's nothing like
a "PTRACE_RESUME_SINGLESTEP" request, so it would need to issue something
like PTRACE_CONT (which disables stepping altogether) or PTRACE_SINGLESTEP,
which would step over the first instruction after the SVC. That's the same
as the behaviour today.

> This patch relies on valid_user_regs() being a signal that PSTATE.SS is
> stale, but that's not always the case. To handle that generally I
> suspect we need two bits of state rather than just TIF_SINGLESTEP.

Added another bit of state feels like we'll open up another can of worms.
Given that I don't think we need it for ptrace, I'll write this up as a
proper patch.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-31  9:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 23:22 [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction Luis Machado
2019-11-18 13:15 ` Will Deacon
2019-11-18 14:54   ` Luis Machado
2019-11-26 16:35     ` Luis Machado
2019-12-10 20:00       ` Luis Machado
2020-02-13 12:01         ` Will Deacon
2020-02-13 17:07           ` Luis Machado
2020-02-14 15:45             ` Luis Machado
2020-02-18  8:44               ` Will Deacon
2020-02-18 10:33                 ` Luis Machado
2020-02-26 13:01                   ` Luis Machado
2020-02-20 13:02           ` Mark Rutland
2020-02-20 13:29             ` Will Deacon
2020-02-21 11:16               ` Mark Rutland
2020-05-27 14:39                 ` Luis Machado
2020-05-31  9:52                 ` Will Deacon
2020-01-13 18:13       ` Luis Machado

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).