linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* arm64: Register modification during syscall entry/exit stop
@ 2020-05-19  1:05 Keno Fischer
  2020-05-19  8:15 ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-19  1:05 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

Continuing my theme of "weird things I encounter
while trying to use ptrace on arm64", I ran into the
effect of the following code in the syscall entry/exit
reporting:

```
/*
* A scratch register (ip(r12) on AArch32, x7 on AArch64) is
* used to denote syscall entry/exit:
*/
regno = (is_compat_task() ? 12 : 7);
saved_reg = regs->regs[regno];
regs->regs[regno] = dir;
```

This seems very weird to me. I can't think of any
other architecture that does something similar
(other than unicore32 apparently, but the ptrace
support there seems like it might have just been
copied from ARM). I'm able to work around this
in my application, but it adds another stumbling block.
Some examples of things that happen:
- Writes to x7 during syscall exit stops are ignored, so
  if the ptracer tries to emulate a setjmp-type thing, it
  might miss this register (ptracers sometimes like to do
  this to manually serialize execution between different
  threads, puppeteering a single thread of execution
  between different register states).
- Reads from x7 are incorrect, so if the ptracer saves
  a register state and later tries to set it back to the task,
  it may get x7 incorrect, but user space may be expecting
  the register to be preserved (when might this happen? -
  consider a ptracer that wants to modify some syscall
  arguments, it modifies the arguments, restarts the syscall
  but then incurs a signal, so it tries to restore the original
  registers to let userspace deal with the signal without
  being confused - expect signal traps don't ignore x7
  modifications, so x7 may have been unexpectedly
  modified).
- We now have seccomp traps, which kind of look and
  act like syscall-entry traps, but don't have this behavior,
  so it's not particularly reliable for ptracers to use.

Furthermore, it seems unnecessary to me on modern
kernels. We now have PTRACE_GET_SYSCALL_INFO,
which exposes this information without lying to the ptracer
about the tracee's registers.

I understand, we can't just change this, since people may
be relying on it, but I would like to propose adding a ptrace
option (PTRACE_O_ARM_REGSGOOD?) that turns this
behavior off. Now, I don't think we currently have any other
arch-specific ptrace options, so maybe there is a different
option that would be preferable (e.g. could be a different
regset), but I do think it would be good to have a way to
operate on the real x7 value. As I said, I can work around it,
but hopefully I will be able to save a future implementer
some headache.

Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-19  1:05 arm64: Register modification during syscall entry/exit stop Keno Fischer
@ 2020-05-19  8:15 ` Will Deacon
  2020-05-19  8:37   ` Keno Fischer
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-05-19  8:15 UTC (permalink / raw)
  To: Keno Fischer
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

Hi Keno,

On Mon, May 18, 2020 at 09:05:30PM -0400, Keno Fischer wrote:
> Continuing my theme of "weird things I encounter
> while trying to use ptrace on arm64", I ran into the
> effect of the following code in the syscall entry/exit
> reporting:
> 
> ```
> /*
> * A scratch register (ip(r12) on AArch32, x7 on AArch64) is
> * used to denote syscall entry/exit:
> */
> regno = (is_compat_task() ? 12 : 7);
> saved_reg = regs->regs[regno];
> regs->regs[regno] = dir;
> ```
> 
> This seems very weird to me. I can't think of any
> other architecture that does something similar
> (other than unicore32 apparently, but the ptrace
> support there seems like it might have just been
> copied from ARM). I'm able to work around this
> in my application, but it adds another stumbling block.

Yes, we inherited this from ARM and I think strace relies on it. In
hindsight, it is a little odd, although x7 is a parameter register in the
PCS and so it won't be live on entry to a system call.

> Some examples of things that happen:
> - Writes to x7 during syscall exit stops are ignored, so
>   if the ptracer tries to emulate a setjmp-type thing, it
>   might miss this register (ptracers sometimes like to do
>   this to manually serialize execution between different
>   threads, puppeteering a single thread of execution
>   between different register states).
> - Reads from x7 are incorrect, so if the ptracer saves
>   a register state and later tries to set it back to the task,
>   it may get x7 incorrect, but user space may be expecting
>   the register to be preserved (when might this happen? -
>   consider a ptracer that wants to modify some syscall
>   arguments, it modifies the arguments, restarts the syscall
>   but then incurs a signal, so it tries to restore the original
>   registers to let userspace deal with the signal without
>   being confused - expect signal traps don't ignore x7
>   modifications, so x7 may have been unexpectedly
>   modified).
> - We now have seccomp traps, which kind of look and
>   act like syscall-entry traps, but don't have this behavior,
>   so it's not particularly reliable for ptracers to use.
> 
> Furthermore, it seems unnecessary to me on modern
> kernels. We now have PTRACE_GET_SYSCALL_INFO,
> which exposes this information without lying to the ptracer
> about the tracee's registers.
> 
> I understand, we can't just change this, since people may
> be relying on it, but I would like to propose adding a ptrace
> option (PTRACE_O_ARM_REGSGOOD?) that turns this
> behavior off. Now, I don't think we currently have any other
> arch-specific ptrace options, so maybe there is a different
> option that would be preferable (e.g. could be a different
> regset), but I do think it would be good to have a way to
> operate on the real x7 value. As I said, I can work around it,
> but hopefully I will be able to save a future implementer
> some headache.

I'm not opposed to extending ptrace so that we can try to wean people off
this interface, but I think we need some concrete situations where the
current behaviour actually causes a problem. Although the examples you've
listed above are interesting, I don't see why x7 is important in any of
them (and we only support up to 6 system call arguments).

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-19  8:15 ` Will Deacon
@ 2020-05-19  8:37   ` Keno Fischer
  2020-05-20 17:41     ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-19  8:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

Hi Will,

> Yes, we inherited this from ARM and I think strace relies on it. In
> hindsight, it is a little odd, although x7 is a parameter register in the
> PCS and so it won't be live on entry to a system call.

I'm not familiar with the PCS acronym, but I assume you mean the
calling convention? You have more faith in userspace than I do ;). For
example, cursory googling brought up this arm64 syscall definition in musl:

https://github.com/bminor/musl/blob/593caa456309714402ca4cb77c3770f4c24da9da/arch/aarch64/syscall_arch.h

The constraints on those asm blocks allow the compiler to assume that
x7 is preserved across the syscall. If a ptracer accidentally modified it
(which is easy to do in the situations that I mentioned), it could
absolutely cause incorrect execution of the userspace program.

> Although the examples you've
> listed above are interesting, I don't see why x7 is important in any of
> them (and we only support up to 6 system call arguments).

It's not so much that x7 is important, it's that lying to the ptracer is
problematic, because it might remember that lie and act on it later.
I did run into exactly this problem, where my ptracer accidentally
changed the value of x7 and caused incorrect execution in the tracee
(now that incorrect execution happened to be an assertion, because
my application is paranoid about these kinds of issues, but it was
incorrect nevertheless)

If it would be helpful, I can code up the syscall entry -> signal trap example
ptracer to have a concrete example.

Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-19  8:37   ` Keno Fischer
@ 2020-05-20 17:41     ` Will Deacon
  2020-05-23  5:35       ` Keno Fischer
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-05-20 17:41 UTC (permalink / raw)
  To: Keno Fischer
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

Hi Keno,

On Tue, May 19, 2020 at 04:37:34AM -0400, Keno Fischer wrote:
> > Yes, we inherited this from ARM and I think strace relies on it. In
> > hindsight, it is a little odd, although x7 is a parameter register in the
> > PCS and so it won't be live on entry to a system call.
> 
> I'm not familiar with the PCS acronym, but I assume you mean the
> calling convention? You have more faith in userspace than I do ;). For
> example, cursory googling brought up this arm64 syscall definition in musl:
> 
> https://github.com/bminor/musl/blob/593caa456309714402ca4cb77c3770f4c24da9da/arch/aarch64/syscall_arch.h

Hmm, does that actually result in the SVC instruction getting inlined? I
think that's quite dangerous, since we document that we can trash the SVE
register state on a system call, for example. I'm also surprised that
the register variables are honoured by compilers if that inlining can occur.

> The constraints on those asm blocks allow the compiler to assume that
> x7 is preserved across the syscall. If a ptracer accidentally modified it
> (which is easy to do in the situations that I mentioned), it could
> absolutely cause incorrect execution of the userspace program.
> 
> > Although the examples you've
> > listed above are interesting, I don't see why x7 is important in any of
> > them (and we only support up to 6 system call arguments).
> 
> It's not so much that x7 is important, it's that lying to the ptracer is
> problematic, because it might remember that lie and act on it later.
> I did run into exactly this problem, where my ptracer accidentally
> changed the value of x7 and caused incorrect execution in the tracee
> (now that incorrect execution happened to be an assertion, because
> my application is paranoid about these kinds of issues, but it was
> incorrect nevertheless)
> 
> If it would be helpful, I can code up the syscall entry -> signal trap example
> ptracer to have a concrete example.

I guess I'm more interested in situations where the compiler thinks x7 is
live, yet we clobber it.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-20 17:41     ` Will Deacon
@ 2020-05-23  5:35       ` Keno Fischer
  2020-05-24  6:56         ` Keno Fischer
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-23  5:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

I got bitten by this again, so I decided to write up a simple example
that shows the problem:

https://gist.github.com/Keno/cde691b26e32373307fb7449ad305739

This runs the same child twice. First vanilla where it prints "Hello world".
The second time, using a textbook ptrace example, to only print "world".
The problem here is that by the time the ptracer gets around to restoring
the registers, it's no longer in a syscall stop, so the write to x7 does not
get ignored and the correct value of x7 gets clobbered.
I copied the syscall definition from musl, so the compiler thinks x7 is
live, and we can see an assertion.

Output on my machine (will depend on compiler version, etc.):
```
$ gcc -g3 -O3 ptrace_lies.c
$ ./a.out
Hello World
World
a.out: ptrace_lies.c:49: do_child: Assertion `v3 == values[2]' failed.
a.out: ptrace_lies.c:134: main: Assertion `WIFEXITED(status) &&
WEXITSTATUS(status) == 0' failed.
Aborted (core dumped)
```

However, I don't think that whether or not the compiler thinks that x7 is
live is the problem here. The problem is entirely that this mechanism
prevents the ptracer from precisely controlling the register state. While
basic ptracers don't need this feature (strace),
more advanced ptracers (think criu, etc.) absolutely do want to precisely
control what the register state is.
The ptracer I'm working on (https://rr-project.org/)
happens to be an extreme case of this, where it wants *bitwise* equivalent
register states such that it can run the same code many times and get
the exact same results.

Also, if the issue was just that the kernel clobbered x7, that would be fine
we could deal with that no problem. However, it's much worse than that,
because the behavior of the kernel with respect to x7 depends on what
kind of ptrace stop we're in and even worse, in some kinds of stop,
there's absolutely no way to get at the actual value of x7.

> Hmm, does that actually result in the SVC instruction getting inlined? I
> think that's quite dangerous, since we document that we can trash the SVE
> register state on a system call, for example. I'm also surprised that
> the register variables are honoured by compilers if that inlining can occur.

I haven't gotten to trying SVE yet, so I appreciate the warning :). That said,
deterministic clobbering of registers is fine. Even changing the registers to
random junk is fine. We're happy to read those registers through ptrace.
The problem here is that the kernel lies about what the contents of the x7
register is and discards any writes to it.

I really hope we can come up with a solution here, I'm already dreading
the next time I unexpectedly run into this and have to add yet
another special case :(.

Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-23  5:35       ` Keno Fischer
@ 2020-05-24  6:56         ` Keno Fischer
  2020-05-27  9:55           ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-24  6:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

Just ran into this issue again, with what I think may be most compelling
example yet why this is problematic:

The tracee incurred a signal, we PTRACE_SYSEMU'd to the rt_sigreturn,
which the tracer tried to emulate by applying the state from the signal frame.
However, the PTRACE_SYSEMU stop is a syscall-stop, so the tracer's write
to x7 was ignored and x7 retained the value it had in the signal handler,
which broke the tracee.

Keno

On Sat, May 23, 2020 at 1:35 AM Keno Fischer <keno@juliacomputing.com> wrote:
>
> I got bitten by this again, so I decided to write up a simple example
> that shows the problem:
>
> https://gist.github.com/Keno/cde691b26e32373307fb7449ad305739
>
> This runs the same child twice. First vanilla where it prints "Hello world".
> The second time, using a textbook ptrace example, to only print "world".
> The problem here is that by the time the ptracer gets around to restoring
> the registers, it's no longer in a syscall stop, so the write to x7 does not
> get ignored and the correct value of x7 gets clobbered.
> I copied the syscall definition from musl, so the compiler thinks x7 is
> live, and we can see an assertion.
>
> Output on my machine (will depend on compiler version, etc.):
> ```
> $ gcc -g3 -O3 ptrace_lies.c
> $ ./a.out
> Hello World
> World
> a.out: ptrace_lies.c:49: do_child: Assertion `v3 == values[2]' failed.
> a.out: ptrace_lies.c:134: main: Assertion `WIFEXITED(status) &&
> WEXITSTATUS(status) == 0' failed.
> Aborted (core dumped)
> ```
>
> However, I don't think that whether or not the compiler thinks that x7 is
> live is the problem here. The problem is entirely that this mechanism
> prevents the ptracer from precisely controlling the register state. While
> basic ptracers don't need this feature (strace),
> more advanced ptracers (think criu, etc.) absolutely do want to precisely
> control what the register state is.
> The ptracer I'm working on (https://rr-project.org/)
> happens to be an extreme case of this, where it wants *bitwise* equivalent
> register states such that it can run the same code many times and get
> the exact same results.
>
> Also, if the issue was just that the kernel clobbered x7, that would be fine
> we could deal with that no problem. However, it's much worse than that,
> because the behavior of the kernel with respect to x7 depends on what
> kind of ptrace stop we're in and even worse, in some kinds of stop,
> there's absolutely no way to get at the actual value of x7.
>
> > Hmm, does that actually result in the SVC instruction getting inlined? I
> > think that's quite dangerous, since we document that we can trash the SVE
> > register state on a system call, for example. I'm also surprised that
> > the register variables are honoured by compilers if that inlining can occur.
>
> I haven't gotten to trying SVE yet, so I appreciate the warning :). That said,
> deterministic clobbering of registers is fine. Even changing the registers to
> random junk is fine. We're happy to read those registers through ptrace.
> The problem here is that the kernel lies about what the contents of the x7
> register is and discards any writes to it.
>
> I really hope we can come up with a solution here, I'm already dreading
> the next time I unexpectedly run into this and have to add yet
> another special case :(.
>
> Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-24  6:56         ` Keno Fischer
@ 2020-05-27  9:55           ` Will Deacon
  2020-05-27 10:19             ` Dave Martin
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-05-27  9:55 UTC (permalink / raw)
  To: Keno Fischer
  Cc: linux-arm-kernel, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Kyle Huey

On Sun, May 24, 2020 at 02:56:35AM -0400, Keno Fischer wrote:
> Just ran into this issue again, with what I think may be most compelling
> example yet why this is problematic:
> 
> The tracee incurred a signal, we PTRACE_SYSEMU'd to the rt_sigreturn,
> which the tracer tried to emulate by applying the state from the signal frame.
> However, the PTRACE_SYSEMU stop is a syscall-stop, so the tracer's write
> to x7 was ignored and x7 retained the value it had in the signal handler,
> which broke the tracee.

Yeah, that sounds like a good justification to add a way to stop this. Could
you send a patch, please?

Interestingly, I *thought* the current behaviour was needed by strace, but I
can't find anything there that seems to require it. Oh well, we're stuck
with it anyway.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-27  9:55           ` Will Deacon
@ 2020-05-27 10:19             ` Dave Martin
  2020-05-31  9:33               ` Will Deacon
  2020-05-31 16:20               ` Keno Fischer
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Martin @ 2020-05-27 10:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Keno Fischer, Catalin Marinas, Kyle Huey, Oleg Nesterov,
	linux-arm-kernel, Linux Kernel Mailing List

On Wed, May 27, 2020 at 10:55:29AM +0100, Will Deacon wrote:
> On Sun, May 24, 2020 at 02:56:35AM -0400, Keno Fischer wrote:
> > Just ran into this issue again, with what I think may be most compelling
> > example yet why this is problematic:
> > 
> > The tracee incurred a signal, we PTRACE_SYSEMU'd to the rt_sigreturn,
> > which the tracer tried to emulate by applying the state from the signal frame.
> > However, the PTRACE_SYSEMU stop is a syscall-stop, so the tracer's write
> > to x7 was ignored and x7 retained the value it had in the signal handler,
> > which broke the tracee.
> 
> Yeah, that sounds like a good justification to add a way to stop this. Could
> you send a patch, please?
> 
> Interestingly, I *thought* the current behaviour was needed by strace, but I
> can't find anything there that seems to require it. Oh well, we're stuck
> with it anyway.

The fact that PTRACE_SYSEMU is only implemented for a few arches makes
we wonder whether it was a misguided addition that should not be ported
to new arches... i.e., why does hardly anyone need it?  But I haven't
attempted to understand the history.

Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
syscall at the syscall enter stop, then modifying the regs at the
syscall exit stop?


If SYSEMU was obviously always broken, perhaps we can withdraw support
for it.  Assuming nobody is crazy enough to try to emulate execve() I
can't see anything other than sigreturn that would be affected by this
issue though.  So maybe SYSEMU isn't broken enough to justify
withdrawal.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-27 10:19             ` Dave Martin
@ 2020-05-31  9:33               ` Will Deacon
  2020-05-31 16:13                 ` Keno Fischer
  2020-05-31 16:20               ` Keno Fischer
  1 sibling, 1 reply; 17+ messages in thread
From: Will Deacon @ 2020-05-31  9:33 UTC (permalink / raw)
  To: Dave Martin
  Cc: Keno Fischer, Catalin Marinas, Kyle Huey, Oleg Nesterov,
	linux-arm-kernel, Linux Kernel Mailing List

On Wed, May 27, 2020 at 11:19:29AM +0100, Dave Martin wrote:
> On Wed, May 27, 2020 at 10:55:29AM +0100, Will Deacon wrote:
> > On Sun, May 24, 2020 at 02:56:35AM -0400, Keno Fischer wrote:
> > > Just ran into this issue again, with what I think may be most compelling
> > > example yet why this is problematic:
> > > 
> > > The tracee incurred a signal, we PTRACE_SYSEMU'd to the rt_sigreturn,
> > > which the tracer tried to emulate by applying the state from the signal frame.
> > > However, the PTRACE_SYSEMU stop is a syscall-stop, so the tracer's write
> > > to x7 was ignored and x7 retained the value it had in the signal handler,
> > > which broke the tracee.
> > 
> > Yeah, that sounds like a good justification to add a way to stop this. Could
> > you send a patch, please?
> > 
> > Interestingly, I *thought* the current behaviour was needed by strace, but I
> > can't find anything there that seems to require it. Oh well, we're stuck
> > with it anyway.
> 
> The fact that PTRACE_SYSEMU is only implemented for a few arches makes
> we wonder whether it was a misguided addition that should not be ported
> to new arches... i.e., why does hardly anyone need it?  But I haven't
> attempted to understand the history.
> 
> Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
> syscall at the syscall enter stop, then modifying the regs at the
> syscall exit stop?
> 
> 
> If SYSEMU was obviously always broken, perhaps we can withdraw support
> for it.  Assuming nobody is crazy enough to try to emulate execve() I
> can't see anything other than sigreturn that would be affected by this
> issue though.  So maybe SYSEMU isn't broken enough to justify
> withdrawal.

Indeed, my preference on another thread [1] was to remove it, but it would
need agreement from Arm, since it was added by them (Sudeep).

But setting that aside, Keno has convinced me that the clobbering of x7
on syscall enter/exit can cause some problems for userspace, so I think
that having a way to disable that seems like a good idea. We can't change
the current default behaviour, but having an explicit opt-in seems
worthwhile.

Keno -- are you planning to send out a patch? You previously spoke about
implementing this using PTRACE_SETOPTIONS.

Will

[1] https://lore.kernel.org/r/20200515121346.GA22919@willie-the-truck

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-31  9:33               ` Will Deacon
@ 2020-05-31 16:13                 ` Keno Fischer
  2020-06-01  9:14                   ` Dave Martin
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-31 16:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dave Martin, Catalin Marinas, Kyle Huey, Oleg Nesterov,
	linux-arm-kernel, Linux Kernel Mailing List

> Keno -- are you planning to send out a patch? You previously spoke about
> implementing this using PTRACE_SETOPTIONS.

Yes, I'll have a patch for you. Though I've come to the conclusion
that introducing a new regset is probably a better way to solve it.
We can then also expose orig_x0 at the same time and give it sane semantics
(there's some problems with the way it works currently - I'll write it up
together with the patch).


Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-27 10:19             ` Dave Martin
  2020-05-31  9:33               ` Will Deacon
@ 2020-05-31 16:20               ` Keno Fischer
  2020-06-01  9:23                 ` Dave Martin
  1 sibling, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-05-31 16:20 UTC (permalink / raw)
  To: Dave Martin
  Cc: Will Deacon, Catalin Marinas, Kyle Huey, Oleg Nesterov,
	linux-arm-kernel, Linux Kernel Mailing List

> Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
> syscall at the syscall enter stop, then modifying the regs at the
> syscall exit stop?

Yes, it can. The idea behind SYSEMU is to be able to save half the
ptrace traps that would require, in theory making the ptracer
a decent amount faster. That said, the x7 issue is orthogonal to
SYSEMU, you'd have the same issues if you used PTRACE_SYSCALL.


Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-31 16:13                 ` Keno Fischer
@ 2020-06-01  9:14                   ` Dave Martin
  2020-06-01  9:23                     ` Keno Fischer
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Martin @ 2020-06-01  9:14 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Will Deacon, Kyle Huey, Catalin Marinas,
	Linux Kernel Mailing List, Oleg Nesterov, linux-arm-kernel

On Sun, May 31, 2020 at 12:13:18PM -0400, Keno Fischer wrote:
> > Keno -- are you planning to send out a patch? You previously spoke about
> > implementing this using PTRACE_SETOPTIONS.
> 
> Yes, I'll have a patch for you. Though I've come to the conclusion
> that introducing a new regset is probably a better way to solve it.
> We can then also expose orig_x0 at the same time and give it sane semantics
> (there's some problems with the way it works currently - I'll write it up
> together with the patch).

I'd worry that having a new ptrace option would be useless bug-
compatibility that is just going to bitrot.

Can you explain why userspace would write a changed value for x7
but at the same time need that new to be thrown away?

That sounds like a nonsensical thing for userspace to be doing.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-06-01  9:14                   ` Dave Martin
@ 2020-06-01  9:23                     ` Keno Fischer
  2020-06-01  9:52                       ` Dave Martin
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-06-01  9:23 UTC (permalink / raw)
  To: Dave Martin
  Cc: Will Deacon, Kyle Huey, Catalin Marinas,
	Linux Kernel Mailing List, Oleg Nesterov, linux-arm-kernel

On Mon, Jun 1, 2020 at 5:14 AM Dave Martin <Dave.Martin@arm.com> wrote:
> Can you explain why userspace would write a changed value for x7
> but at the same time need that new to be thrown away?

The discarding behavior is the primary reason things aren't completely
broken at the moment. If it read the wrong x7 value and didn't know about
the Aarch64 quirk, it's often just trying to write that same wrong
value back during the next stop, so if that's just ignored,
that's probably fine in 99% of cases, since the value in the
tracee will be undisturbed.

I don't think there's a sane way to change the aarch64 NT_PRSTATUS
semantics without just completely removing the x7 behavior, but of course
people may be relying on that (I think somebody said upthread that strace does?)

Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-05-31 16:20               ` Keno Fischer
@ 2020-06-01  9:23                 ` Dave Martin
  2020-06-01  9:40                   ` Keno Fischer
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Martin @ 2020-06-01  9:23 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Kyle Huey, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Will Deacon, linux-arm-kernel

On Sun, May 31, 2020 at 12:20:51PM -0400, Keno Fischer wrote:
> > Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
> > syscall at the syscall enter stop, then modifying the regs at the
> > syscall exit stop?
> 
> Yes, it can. The idea behind SYSEMU is to be able to save half the
> ptrace traps that would require, in theory making the ptracer
> a decent amount faster. That said, the x7 issue is orthogonal to
> SYSEMU, you'd have the same issues if you used PTRACE_SYSCALL.

Right, I just wondered whether there was some deeper difference between
the two approaches.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-06-01  9:23                 ` Dave Martin
@ 2020-06-01  9:40                   ` Keno Fischer
  2020-06-01  9:59                     ` Dave Martin
  0 siblings, 1 reply; 17+ messages in thread
From: Keno Fischer @ 2020-06-01  9:40 UTC (permalink / raw)
  To: Dave Martin
  Cc: Kyle Huey, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Will Deacon, linux-arm-kernel

On Mon, Jun 1, 2020 at 5:23 AM Dave Martin <Dave.Martin@arm.com> wrote:
> > > Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
> > > syscall at the syscall enter stop, then modifying the regs at the
> > > syscall exit stop?
> >
> > Yes, it can. The idea behind SYSEMU is to be able to save half the
> > ptrace traps that would require, in theory making the ptracer
> > a decent amount faster. That said, the x7 issue is orthogonal to
> > SYSEMU, you'd have the same issues if you used PTRACE_SYSCALL.
>
> Right, I just wondered whether there was some deeper difference between
> the two approaches.

You're asking about a new regset vs trying to do it via ptrace option?
I don't think there's anything a ptrace option can do that a new regset
that replicates the same registers (I'm gonna propose adding orig_x0,
while we're at it and changing the x0 semantics a bit, will have
those details with the patch) wouldn't be able to do . The reason I
originally thought it might have to be a ptrace option is because
the register modification currently gets applied in the syscall entry
code to the actual regs struct, so I thought you might have to know
to preserve those registers. However, then I realized that you could
just change the regset accessors to emulate the old behavior, since
we do already store all the required information (what kind of stop
we're currently at) in order to be able to answer the ptrace
informational queries. So doing that it probably just all around
easier. I guess NT_PRSTATUS might also rot, but I guess strace
doesn't really have to stop using it, since it doesn't care about
the x7 value nor does it need to modify it.

Keno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-06-01  9:23                     ` Keno Fischer
@ 2020-06-01  9:52                       ` Dave Martin
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Martin @ 2020-06-01  9:52 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Kyle Huey, Catalin Marinas, Linux Kernel Mailing List,
	Oleg Nesterov, Will Deacon, linux-arm-kernel

On Mon, Jun 01, 2020 at 05:23:01AM -0400, Keno Fischer wrote:
> On Mon, Jun 1, 2020 at 5:14 AM Dave Martin <Dave.Martin@arm.com> wrote:
> > Can you explain why userspace would write a changed value for x7
> > but at the same time need that new to be thrown away?
> 
> The discarding behavior is the primary reason things aren't completely
> broken at the moment. If it read the wrong x7 value and didn't know about
> the Aarch64 quirk, it's often just trying to write that same wrong
> value back during the next stop, so if that's just ignored,
> that's probably fine in 99% of cases, since the value in the
> tracee will be undisturbed.

I guess that's my question: when is x7 "disturbed".

Other than sigreturn, I can't think of a case.

I'm likely missing some aspect of what you're trying to do.

> I don't think there's a sane way to change the aarch64 NT_PRSTATUS
> semantics without just completely removing the x7 behavior, but of course
> people may be relying on that (I think somebody said upthread that strace does?)

Since rt_sigreturn emulation was always broken, can we just say
that the effect of updating any reg other than x0 is unspecified in this
case?

Even fixing the x7 issue won't magically teach your tracer how to
deal with unrecognised data in the signal frame, so new hardware or
a new kernel could cause your tracer to become subtly broken.  Would you
be better off tweaking the real signal frame as desired and doing a real
rt_sigreturn for example, instead of attempting to emulate it?


I'm somewhat playing devil's advocate here...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: arm64: Register modification during syscall entry/exit stop
  2020-06-01  9:40                   ` Keno Fischer
@ 2020-06-01  9:59                     ` Dave Martin
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Martin @ 2020-06-01  9:59 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Kyle Huey, Catalin Marinas, Oleg Nesterov,
	Linux Kernel Mailing List, Will Deacon, linux-arm-kernel

On Mon, Jun 01, 2020 at 05:40:28AM -0400, Keno Fischer wrote:
> On Mon, Jun 1, 2020 at 5:23 AM Dave Martin <Dave.Martin@arm.com> wrote:
> > > > Can't PTRACE_SYSEMU be emulated by using PTRACE_SYSCALL, cancelling the
> > > > syscall at the syscall enter stop, then modifying the regs at the
> > > > syscall exit stop?
> > >
> > > Yes, it can. The idea behind SYSEMU is to be able to save half the
> > > ptrace traps that would require, in theory making the ptracer
> > > a decent amount faster. That said, the x7 issue is orthogonal to
> > > SYSEMU, you'd have the same issues if you used PTRACE_SYSCALL.
> >
> > Right, I just wondered whether there was some deeper difference between
> > the two approaches.
> 
> You're asking about a new regset vs trying to do it via ptrace option?

I meant SYSEMU versus SYSCALL + cancellation and emulating the syscall
at the syscall exit stop.

i.e., I was trying to understand whether SYSEMU is just a convenience,
or does some magic that can't be reproduced by other means.

> I don't think there's anything a ptrace option can do that a new regset
> that replicates the same registers (I'm gonna propose adding orig_x0,
> while we're at it and changing the x0 semantics a bit, will have
> those details with the patch) wouldn't be able to do . The reason I
> originally thought it might have to be a ptrace option is because
> the register modification currently gets applied in the syscall entry
> code to the actual regs struct, so I thought you might have to know
> to preserve those registers. However, then I realized that you could
> just change the regset accessors to emulate the old behavior, since
> we do already store all the required information (what kind of stop
> we're currently at) in order to be able to answer the ptrace
> informational queries. So doing that it probably just all around
> easier. I guess NT_PRSTATUS might also rot, but I guess strace
> doesn't really have to stop using it, since it doesn't care about
> the x7 value nor does it need to modify it.

I think NT_PRSTATUS probably doesn't need to change.

Having a duplicate regset feels like a worse outcome that having a new
ptrace option.  Undocumentedly different things already happen to the
regs depending on how the tracee stopped, so adding a new special case
doesn't seem to justify creating a new regset.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-06-01  9:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19  1:05 arm64: Register modification during syscall entry/exit stop Keno Fischer
2020-05-19  8:15 ` Will Deacon
2020-05-19  8:37   ` Keno Fischer
2020-05-20 17:41     ` Will Deacon
2020-05-23  5:35       ` Keno Fischer
2020-05-24  6:56         ` Keno Fischer
2020-05-27  9:55           ` Will Deacon
2020-05-27 10:19             ` Dave Martin
2020-05-31  9:33               ` Will Deacon
2020-05-31 16:13                 ` Keno Fischer
2020-06-01  9:14                   ` Dave Martin
2020-06-01  9:23                     ` Keno Fischer
2020-06-01  9:52                       ` Dave Martin
2020-05-31 16:20               ` Keno Fischer
2020-06-01  9:23                 ` Dave Martin
2020-06-01  9:40                   ` Keno Fischer
2020-06-01  9:59                     ` Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).