linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ptrace bugs and related problems
@ 2006-07-27  6:55 Albert Cahalan
  2006-07-27  7:19 ` David Miller
  2006-07-27 20:31 ` Daniel Jacobowitz
  0 siblings, 2 replies; 15+ messages in thread
From: Albert Cahalan @ 2006-07-27  6:55 UTC (permalink / raw)
  To: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

Many of these bugs are generic, some are pure i386, some are for
i386 binaries on the x86-64 kernel, and some apply to a bit more.
Some bugs may involve race conditions: I use a 2-core AMD system.
Kernels vary, but are generally quite recent. (stock 2.6.17.7,
FC5's latest update, etc.)

There is a ptrace option to follow vfork, and an option to get a
message when the parent is released by the child. In kernel/fork.c
there is a bad attempt at optimization which prevents the release
message (PTRACE_EVENT_VFORK_DONE) from being sent unless the ptrace
user also chose the option to follow the vfork child.

System call restart does not appear to nest. It stores stuff
in the thread info rather than on the user stack.

Both i386 and x86-64 PTRACE_SINGLESTEP only check for popf, not iret.
Yes, really, iret can be used by normal apps. There is also no check
for failure, as when the popf or iret takes an alignment exception
or hits an unmapped page. The signal handler could fix that up, but
the kernel still thinks that the popf must have set TF in eflags and
thus writes a messed-up eflags into the sigcontext.

There is the pushf problem. Single-stepping this simple code
does not work:   pushf ; popf

A debugger can set or get the siginfo. Great. Signal handlers also
have sigcontext/ucontext data. Besides being generally very useful,
this is the only place where the cr2 register and trap error data
can be found. Looking on the stack only works once the signal is
allowed to be delivered, which may be too late for the debugger.

x86-64 has big problems single-stepping in the vdso's signal
return path. Suppose I breakpoint the pop. (this is in the path
that goes pop,mov,syscall or pop,mov,sysenter) If I then try to
single step, the process runs free. The i386 arch works fine.

I can't even set the hardware breakpoints:

(gdb) hbreak __kernel_sigreturn
Hardware assisted breakpoint 1 at 0xffffe500
(gdb) hbreak __kernel_rt_sigreturn
Hardware assisted breakpoint 2 at 0xffffe600
(gdb) continue
Continuing.
Couldn't write debug register: Input/output error.

The debugger has no way to reliably stop a process without causing
confusion. The SIGSTOP signal is not queued. The app under debug might
use SIGSTOP and rely on SIGSTOP to work. The debugger can't steal this.
Any signal that could be queued can also be blocked. The debugger has
no way to get notice when a signal has merely been queued, can not
see into the queue, and can not reasonably adjust the signal mask.

The is_at_popf function on x86-64 fails to account for instruction
set differences. Many prefixes are only valid in 32-bit mode, and
many others are only valid in 64-bit mode. The name is of course
wrong too; see above note about iret and other problems.

The PTRACE_EVENT_EXEC messages are just plain unreliable. They don't
always arrive. Things get especially ugly when a non-leader task
does an execve.

A debugger has little reasonable access to x86 segment info.
Given an arbitrary segment number, I can not generally look it
up in the context of the target process. I can special case
the typical ones, separately for i386 and x86-64. I can "know"
that specific segments are the context switched ones, then ask
the kernel about those.

A debugger needs to read the vdso page. A debugger might want to use
either /proc/*/mem or PTRACE_PEEK. One of the architectures can't do
both. If I remember right, x86-64 can't PTRACE_PEEK.

Suppose my debugger has a few threads. PTRACE_ATTACH will not share.
All ptrace calls fail for all threads other than the one that attached.
It really sucks to have to funnel everything through one thread.

BTW, not bugs exactly, but... Getting ptrace events via waitpid is
horrible. Events arrive in some arbitrary order, with no peeking ahead
either within a single target process or even across multiple target
processes. Messages from successful clone/fork/exec may arrive before
or after the child stops, making for some lovely non-deterministic
behavior. Also, it's no fun to mix waitpid with signals or select.
Writing a reliable debugger with ptrace on Linux is absurdly painful.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-27  6:55 ptrace bugs and related problems Albert Cahalan
@ 2006-07-27  7:19 ` David Miller
  2006-07-27 20:31 ` Daniel Jacobowitz
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2006-07-27  7:19 UTC (permalink / raw)
  To: acahalan; +Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

From: "Albert Cahalan" <acahalan@gmail.com>
Date: Thu, 27 Jul 2006 02:55:17 -0400

> Writing a reliable debugger with ptrace on Linux is absurdly painful.

This is why people like Roland are working on utrace, it seems to
provide ways to deal with most of the limitations you mention,
especially the signal ones.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-27  6:55 ptrace bugs and related problems Albert Cahalan
  2006-07-27  7:19 ` David Miller
@ 2006-07-27 20:31 ` Daniel Jacobowitz
  2006-07-28  1:17   ` Albert Cahalan
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Jacobowitz @ 2006-07-27 20:31 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

On Thu, Jul 27, 2006 at 02:55:17AM -0400, Albert Cahalan wrote:
> Many of these bugs are generic, some are pure i386, some are for
> i386 binaries on the x86-64 kernel, and some apply to a bit more.
> Some bugs may involve race conditions: I use a 2-core AMD system.
> Kernels vary, but are generally quite recent. (stock 2.6.17.7,
> FC5's latest update, etc.)

Reporting bugs individually, and with a bit more detail, has the
advantage that people can actually keep track of them and
recognize them; I highly recommend it.  And how are we supposed to
answer bugs that apply individually to kernels of unspecified origin?

> There is a ptrace option to follow vfork, and an option to get a
> message when the parent is released by the child. In kernel/fork.c
> there is a bad attempt at optimization which prevents the release
> message (PTRACE_EVENT_VFORK_DONE) from being sent unless the ptrace
> user also chose the option to follow the vfork child.

This doesn't make sense.  Example?

     wait_for_completion(&vfork);
     if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
            ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);

When the parent's vfork is done, the parent's debugger gets a
notification.

> The debugger has no way to reliably stop a process without causing
> confusion. The SIGSTOP signal is not queued. The app under debug might
> use SIGSTOP and rely on SIGSTOP to work. The debugger can't steal this.
> Any signal that could be queued can also be blocked. The debugger has
> no way to get notice when a signal has merely been queued, can not
> see into the queue, and can not reasonably adjust the signal mask.

See utrace.  This problem is roughly not solvable using ptrace.

> The PTRACE_EVENT_EXEC messages are just plain unreliable. They don't
> always arrive. Things get especially ugly when a non-leader task
> does an execve.

This is what I meant by vague bug reports.  The code for sending this
event is quite simple.  Things do get ugly when non-leader tasks exec;
I don't know whether the forced exits of other threads are clearly
visible from the debugger.

> A debugger needs to read the vdso page. A debugger might want to use
> either /proc/*/mem or PTRACE_PEEK. One of the architectures can't do
> both. If I remember right, x86-64 can't PTRACE_PEEK.

As far as I know I don't have this problem, on x86_64.

> Suppose my debugger has a few threads. PTRACE_ATTACH will not share.
> All ptrace calls fail for all threads other than the one that attached.
> It really sucks to have to funnel everything through one thread.

This is a known limit of ptrace.  It's discussed periodically.

> BTW, not bugs exactly, but... Getting ptrace events via waitpid is
> horrible. Events arrive in some arbitrary order, with no peeking ahead
> either within a single target process or even across multiple target
> processes. Messages from successful clone/fork/exec may arrive before
> or after the child stops, making for some lovely non-deterministic
> behavior. Also, it's no fun to mix waitpid with signals or select.
> Writing a reliable debugger with ptrace on Linux is absurdly painful.

See utrace.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-27 20:31 ` Daniel Jacobowitz
@ 2006-07-28  1:17   ` Albert Cahalan
  2006-07-28  3:47     ` Daniel Jacobowitz
  0 siblings, 1 reply; 15+ messages in thread
From: Albert Cahalan @ 2006-07-28  1:17 UTC (permalink / raw)
  To: Albert Cahalan, torvalds, alan, ak, mingo, arjan, akpm,
	linux-kernel, roland

On 7/27/06, Daniel Jacobowitz <dan@debian.org> wrote:
> On Thu, Jul 27, 2006 at 02:55:17AM -0400, Albert Cahalan wrote:
> > Many of these bugs are generic, some are pure i386, some are for
> > i386 binaries on the x86-64 kernel, and some apply to a bit more.
> > Some bugs may involve race conditions: I use a 2-core AMD system.
> > Kernels vary, but are generally quite recent. (stock 2.6.17.7,
> > FC5's latest update, etc.)
>
> Reporting bugs individually, and with a bit more detail, has the
> advantage that people can actually keep track of them and
> recognize them; I highly recommend it.  And how are we supposed to
> answer bugs that apply individually to kernels of unspecified origin?

I think the detail is enough to be useful, which is better
than no bug reports at all.

I've been taking notes as I encounter the bugs at work.
I just recently got an OK to post them. While trying to
find workarounds so I can ship a product, I certainly did
not have an excess of time to play with the bugs.

> > There is a ptrace option to follow vfork, and an option to get a
> > message when the parent is released by the child. In kernel/fork.c
> > there is a bad attempt at optimization which prevents the release
> > message (PTRACE_EVENT_VFORK_DONE) from being sent unless the ptrace
> > user also chose the option to follow the vfork child.
>
> This doesn't make sense.  Example?
>
>      wait_for_completion(&vfork);
>      if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
>             ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
>
> When the parent's vfork is done, the parent's debugger gets a
> notification.

Minor correction: the message is sent with bad data.
Here at home I happen to have 2.6.17-rc5, so
looking in the kernel/fork.c file there:

The fork_traceflag function looks only at the flags
used to follow processes, including PT_TRACE_VFORK.

In do_fork, the result of fork_traceflag is assigned
to the "trace" variable. Note that PT_TRACE_VFORK_DONE
does not cause "trace" to be non-zero.

Then we hit this code:

                if (unlikely (trace)) {
                        current->ptrace_message = nr;
                        ptrace_notify ((trace << 8) | SIGTRAP);
                }

That doesn't run. The ptrace_message is thus not set when
ptrace_notify is called to send the PTRACE_EVENT_VFORK_DONE
message. You get random stale data from a previous message.

> > The PTRACE_EVENT_EXEC messages are just plain unreliable. They don't
> > always arrive. Things get especially ugly when a non-leader task
> > does an execve.
>
> This is what I meant by vague bug reports.  The code for sending this
> event is quite simple.  Things do get ugly when non-leader tasks exec;
> I don't know whether the forced exits of other threads are clearly
> visible from the debugger.

The forced exits show up, oddly. I see one for each task,
except for the task which called execve(). The task calling
execve() will silently go away. The leader task, despite
being reported as dead, returns from execve. Ouch. It would
be much more friendly to have the task calling execve()
send a (new) PTRACE_EVENT_TID_CHANGE message with the new ID
as the ptrace_message. If this is the very last message sent
by the task doing execve and is made to arrive in proper order,
the debugger can renumber the structures it uses to track tasks.

I don't get WIFEXITED with waitpid for any of this.

> > Suppose my debugger has a few threads. PTRACE_ATTACH will not share.
> > All ptrace calls fail for all threads other than the one that attached.
> > It really sucks to have to funnel everything through one thread.
>
> This is a known limit of ptrace.  It's discussed periodically.

It's a known bug. More trouble:

Note that the new unshare() system call will need to send
ptrace events for all tasks affected. Sending the event from
one task is no good because the event might arrive after the
debugger has responded to some other task. Consider breakpoints
in a shared mm, with the mm suddenly becoming unshared.

There is also no way to find all the tasks which share an mm.
This is needed so that tasks don't die if the debugger attaches
to a pre-existing task and sets a breakpoint.

The /proc/*/auxv files don't work immediately after starting
a process via the usual fork,PTRACE_TRACEME,exec method.
One has to wait some undetermined amount of time.

PTRACE_GETSIGINFO has 0x0605 as si_code when a process exits.
This is not defined anywhere.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-28  1:17   ` Albert Cahalan
@ 2006-07-28  3:47     ` Daniel Jacobowitz
  2006-07-28 22:28       ` Albert Cahalan
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Jacobowitz @ 2006-07-28  3:47 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

On Thu, Jul 27, 2006 at 09:17:48PM -0400, Albert Cahalan wrote:
> Minor correction: the message is sent with bad data.
> Here at home I happen to have 2.6.17-rc5, so
> looking in the kernel/fork.c file there:
> 
> The fork_traceflag function looks only at the flags
> used to follow processes, including PT_TRACE_VFORK.
> 
> In do_fork, the result of fork_traceflag is assigned
> to the "trace" variable. Note that PT_TRACE_VFORK_DONE
> does not cause "trace" to be non-zero.
> 
> Then we hit this code:
> 
>                if (unlikely (trace)) {
>                        current->ptrace_message = nr;
>                        ptrace_notify ((trace << 8) | SIGTRAP);
>                }
> 
> That doesn't run. The ptrace_message is thus not set when
> ptrace_notify is called to send the PTRACE_EVENT_VFORK_DONE
> message. You get random stale data from a previous message.

Why do you want the message data anyway?

FORK/VFORK/CLONE events have a message: it says what the new process's
PID is.  VFORK_DONE doesn't have a message, because it only indicates
that the current process is about to resume; it's an event that only
has one process associated with it.

I really don't think this is a bug.

> The forced exits show up, oddly. I see one for each task,
> except for the task which called execve(). The task calling
> execve() will silently go away. The leader task, despite
> being reported as dead, returns from execve. Ouch. It would
> be much more friendly to have the task calling execve()
> send a (new) PTRACE_EVENT_TID_CHANGE message with the new ID
> as the ptrace_message. If this is the very last message sent
> by the task doing execve and is made to arrive in proper order,
> the debugger can renumber the structures it uses to track tasks.

Or just present things as if the leader task did the execve, which is
effectively what happens, and what I thought would happen for ptrace
too.

> Note that the new unshare() system call will need to send
> ptrace events for all tasks affected. Sending the event from
> one task is no good because the event might arrive after the
> debugger has responded to some other task. Consider breakpoints
> in a shared mm, with the mm suddenly becoming unshared.

The interface was never designed to handle unsharing.  I don't really
think it should be extended to; whoever needs this functionality should
design something cleaner for utrace.

> There is also no way to find all the tasks which share an mm.
> This is needed so that tasks don't die if the debugger attaches
> to a pre-existing task and sets a breakpoint.

Ditto.  In practice, thread groups or LinuxThreads libthread_db suffice
for daily use.

> The /proc/*/auxv files don't work immediately after starting
> a process via the usual fork,PTRACE_TRACEME,exec method.
> One has to wait some undetermined amount of time.

I have no idea what this refers to, sorry.

> PTRACE_GETSIGINFO has 0x0605 as si_code when a process exits.
> This is not defined anywhere.

It's garbage.  PTRACE_GETSIGINFO is only valid after the process stops
with a signal.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-28  3:47     ` Daniel Jacobowitz
@ 2006-07-28 22:28       ` Albert Cahalan
  2006-07-28 22:36         ` David Miller
  2006-07-31 19:00         ` Daniel Jacobowitz
  0 siblings, 2 replies; 15+ messages in thread
From: Albert Cahalan @ 2006-07-28 22:28 UTC (permalink / raw)
  To: Albert Cahalan, torvalds, alan, ak, mingo, arjan, akpm,
	linux-kernel, roland

On 7/27/06, Daniel Jacobowitz <dan@debian.org> wrote:
> On Thu, Jul 27, 2006 at 09:17:48PM -0400, Albert Cahalan wrote:
> > Minor correction: the message is sent with bad data.
> > Here at home I happen to have 2.6.17-rc5, so
> > looking in the kernel/fork.c file there:
> >
> > The fork_traceflag function looks only at the flags
> > used to follow processes, including PT_TRACE_VFORK.
> >
> > In do_fork, the result of fork_traceflag is assigned
> > to the "trace" variable. Note that PT_TRACE_VFORK_DONE
> > does not cause "trace" to be non-zero.
> >
> > Then we hit this code:
> >
> >                if (unlikely (trace)) {
> >                        current->ptrace_message = nr;
> >                        ptrace_notify ((trace << 8) | SIGTRAP);
> >                }
> >
> > That doesn't run. The ptrace_message is thus not set when
> > ptrace_notify is called to send the PTRACE_EVENT_VFORK_DONE
> > message. You get random stale data from a previous message.
>
> Why do you want the message data anyway?

I was using the data to look up which task just got split away
from the parent. Judging by Chuck Ebbert's email, I'm not the
only person to expect the data to be valid.

> > The forced exits show up, oddly. I see one for each task,
> > except for the task which called execve(). The task calling
> > execve() will silently go away. The leader task, despite
> > being reported as dead, returns from execve. Ouch. It would
> > be much more friendly to have the task calling execve()
> > send a (new) PTRACE_EVENT_TID_CHANGE message with the new ID
> > as the ptrace_message. If this is the very last message sent
> > by the task doing execve and is made to arrive in proper order,
> > the debugger can renumber the structures it uses to track tasks.
>
> Or just present things as if the leader task did the execve, which is
> effectively what happens, and what I thought would happen for ptrace
> too.

That makes things even weirder. A successful execve done in one
thread appears to be done by another (which might not be
traced if the debugger was a bit odd), while a failing execve
appears... where?

> > Note that the new unshare() system call will need to send
> > ptrace events for all tasks affected. Sending the event from
> > one task is no good because the event might arrive after the
> > debugger has responded to some other task. Consider breakpoints
> > in a shared mm, with the mm suddenly becoming unshared.
>
> The interface was never designed to handle unsharing.  I don't really
> think it should be extended to; whoever needs this functionality should
> design something cleaner for utrace.

I'm not sure utrace will be accepted. (many ptrace alternatives
have been born and died over the years) Even if utrace does get
accepted, initially we only get:

1. a clean-up that provides hope for the future
2. a hopefully-compatible ptrace on top of utrace
3. some sort of demo interface

That alone won't replace ptrace.

> > There is also no way to find all the tasks which share an mm.
> > This is needed so that tasks don't die if the debugger attaches
> > to a pre-existing task and sets a breakpoint.
>
> Ditto.  In practice, thread groups or LinuxThreads libthread_db suffice
> for daily use.
>
> > The /proc/*/auxv files don't work immediately after starting
> > a process via the usual fork,PTRACE_TRACEME,exec method.
> > One has to wait some undetermined amount of time.
>
> I have no idea what this refers to, sorry.

Never mind. I need to use vfork to ensure that the debugger
does not run until the child has done execve. Hopefully the
vfork wake-up won't happen before /proc/*/auxv is ready.
(doing a wait is awkward: I want the data before I enter my
main debugger loop, but the main debugger loop is based on
waitpid and will thus hang if the event is eaten early)

> > PTRACE_GETSIGINFO has 0x0605 as si_code when a process exits.
> > This is not defined anywhere.
>
> It's garbage.  PTRACE_GETSIGINFO is only valid after the process stops
> with a signal.

The process does indeed stop with a signal. It gets SIGTRAP
as part of sending the ptrace event.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-28 22:28       ` Albert Cahalan
@ 2006-07-28 22:36         ` David Miller
  2006-07-31 19:00         ` Daniel Jacobowitz
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2006-07-28 22:36 UTC (permalink / raw)
  To: acahalan; +Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

From: "Albert Cahalan" <acahalan@gmail.com>
Date: Fri, 28 Jul 2006 18:28:34 -0400

> I'm not sure utrace will be accepted.

I'm highly confident it will be.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-28 22:28       ` Albert Cahalan
  2006-07-28 22:36         ` David Miller
@ 2006-07-31 19:00         ` Daniel Jacobowitz
  2006-08-01  0:08           ` Albert Cahalan
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Jacobowitz @ 2006-07-31 19:00 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

On Fri, Jul 28, 2006 at 06:28:34PM -0400, Albert Cahalan wrote:
> I was using the data to look up which task just got split away
> from the parent. Judging by Chuck Ebbert's email, I'm not the
> only person to expect the data to be valid.

So it seems!  It seems a reasonable addition if anyone wants to submit
it.

> >Or just present things as if the leader task did the execve, which is
> >effectively what happens, and what I thought would happen for ptrace
> >too.
> 
> That makes things even weirder. A successful execve done in one
> thread appears to be done by another (which might not be
> traced if the debugger was a bit odd), while a failing execve
> appears... where?

Not at all, unless you're doing syscall tracing, I don't think.  The
exec notification is after the mm is replaced.

> >The interface was never designed to handle unsharing.  I don't really
> >think it should be extended to; whoever needs this functionality should
> >design something cleaner for utrace.
> 
> I'm not sure utrace will be accepted. (many ptrace alternatives
> have been born and died over the years) Even if utrace does get
> accepted, initially we only get:
> 
> 1. a clean-up that provides hope for the future
> 2. a hopefully-compatible ptrace on top of utrace
> 3. some sort of demo interface
> 
> That alone won't replace ptrace.

That's why I suggested someone design a cleaner debugging interface to
be implemented on top of utrace - which is how it's supposed to be
used.  Like David, I am confident that this is the future direction of
Linux debugging.

> >> PTRACE_GETSIGINFO has 0x0605 as si_code when a process exits.
> >> This is not defined anywhere.
> >
> >It's garbage.  PTRACE_GETSIGINFO is only valid after the process stops
> >with a signal.
> 
> The process does indeed stop with a signal. It gets SIGTRAP
> as part of sending the ptrace event.

Sure, but you must know what I meant.  PTRACE_GETSIGINFO is only valid
when there is a real signal, i.e. generated by something other than
ptrace.  Which is true whenever wait reports a signal without any of
the special event bits set (except for the legacy SIGTRAP on execve).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-31 19:00         ` Daniel Jacobowitz
@ 2006-08-01  0:08           ` Albert Cahalan
  2006-08-01  1:37             ` Daniel Jacobowitz
  0 siblings, 1 reply; 15+ messages in thread
From: Albert Cahalan @ 2006-08-01  0:08 UTC (permalink / raw)
  To: Albert Cahalan, torvalds, alan, ak, mingo, arjan, akpm,
	linux-kernel, roland

On 7/31/06, Daniel Jacobowitz <dan@debian.org> wrote:
> On Fri, Jul 28, 2006 at 06:28:34PM -0400, Albert Cahalan wrote:
> [somebody - lost ref]

> > >Or just present things as if the leader task did the execve, which is
> > >effectively what happens, and what I thought would happen for ptrace
> > >too.
> >
> > That makes things even weirder. A successful execve done in one
> > thread appears to be done by another (which might not be
> > traced if the debugger was a bit odd), while a failing execve
> > appears... where?
>
> Not at all, unless you're doing syscall tracing, I don't think.  The
> exec notification is after the mm is replaced.

Syscall tracing is pretty much a given I think.
There are numerous reasons to use it, not all
of which I remember. I think some of the reasons
are related to single-stepping over sysenter,
syscall, and int 0x80.

The execve event is unreliable anyway.
Thus, it is necessary to use syscall tracing.

So that leaves a debugger with the weirdness
of a system call that enters via one task and
then exits via a different task. That different
task might have been running (a syscall exit
without an entry within that task) or in some
unrelated syscall (whee... the syscall number
suddenly changed) or even racing in execve.

> > >> PTRACE_GETSIGINFO has 0x0605 as si_code when a process exits.
> > >> This is not defined anywhere.
> > >
> > >It's garbage.  PTRACE_GETSIGINFO is only valid after the process stops
> > >with a signal.
> >
> > The process does indeed stop with a signal. It gets SIGTRAP
> > as part of sending the ptrace event.
>
> Sure, but you must know what I meant.  PTRACE_GETSIGINFO is only valid
> when there is a real signal, i.e. generated by something other than
> ptrace.  Which is true whenever wait reports a signal without any of
> the special event bits set (except for the legacy SIGTRAP on execve).

That sucks. I like converting si_code to something
readable that I can present to the user. Well, it seems
to work anyway. The main failure is that an access
to unmapped memory does not give 3 distinct codes
for read/write/execute.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-08-01  0:08           ` Albert Cahalan
@ 2006-08-01  1:37             ` Daniel Jacobowitz
  2006-08-01  5:22               ` Albert Cahalan
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Jacobowitz @ 2006-08-01  1:37 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: torvalds, alan, ak, mingo, arjan, akpm, linux-kernel, roland

On Mon, Jul 31, 2006 at 08:08:35PM -0400, Albert Cahalan wrote:
> The execve event is unreliable anyway.
> Thus, it is necessary to use syscall tracing.

You keep saying this "unreliable" thing, and I don't think it means
what you think it means.  It should always be delivered.  When it
isn't, there's a bug.  I don't know of any, unless you're talking about
the thread group issue you just reported.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-08-01  1:37             ` Daniel Jacobowitz
@ 2006-08-01  5:22               ` Albert Cahalan
  0 siblings, 0 replies; 15+ messages in thread
From: Albert Cahalan @ 2006-08-01  5:22 UTC (permalink / raw)
  To: Albert Cahalan, ak, mingo, arjan, akpm, linux-kernel, roland

On 7/31/06, Daniel Jacobowitz <dan@debian.org> wrote:
> On Mon, Jul 31, 2006 at 08:08:35PM -0400, Albert Cahalan wrote:
> > The execve event is unreliable anyway.
> > Thus, it is necessary to use syscall tracing.
>
> You keep saying this "unreliable" thing, and I don't think it means
> what you think it means.  It should always be delivered.  When it
> isn't, there's a bug.  I don't know of any, unless you're talking about
> the thread group issue you just reported.

Yeah, I figure there is a bug.

It'd be great if you could reproduce the bug.
My setup:

2-core CPU
64-bit kernel (2.6.17 FC5, next-to-latest revision)
32-bit target app (assembly - no C library)
32-bit debugger

The target app does CLONE_THREAD. The child does
that again, then execve. The first and last threads spin
in a loop, either burning CPU time or doing the pause
system call. (the middle thread does the execve)

I see the messages just fine on many 32-bit non-SMP
systems that I tested with: Gentoo 2.6.16, Gentoo 2.6.13,
plain 2.6.16, maybe 2.6.17.7... mostly in VMWare.
Perhaps it is the SMP, the 64-bit, or Fedora being broken.
I can not say, and most likely can not investigate more.

I hope that helps.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
@ 2006-08-01  5:52 Chuck Ebbert
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Ebbert @ 2006-08-01  5:52 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Roland Dreier, linux-kernel, Ingo Molnar, Andrew Morton,
	Arjan van de Ven, Andi Kleen, Alan Cox, Linus Torvalds

In-Reply-To: <787b0d920607311730s5a951a5cv38eea7db03c759c8@mail.gmail.com>

On Mon, 31 Jul 2006 20:30:07 -0400, Albert Cahalan wrote:
> 
> > > There is also no check
> > > for failure, as when the popf or iret takes an alignment exception
> > > or hits an unmapped page.
> >
> > Can that happen?
> 
> You're at a popf that can not complete.
> You single-step.
> The kernel sets TF.
> The kernel notes the popf.
> The kernel assumes that TF will be determined by the popf.
> The kernel tries to run the popf.
> The popf faults, leaving TF unmodified.
> The kernel fails to clear TF.

That can be fixed, but it won't be easy.

> > > There is the pushf problem. Single-stepping this simple code
> > > does not work:   pushf ; popf
> >
> > The debugger needs to mask TF in the pushed flags.  Read the comment
> > in is_at_popf().
> 
> I think the term is "known bug".

Well at least it's known. :)

> > > The is_at_popf function on x86-64 fails to account for instruction
> > > set differences. Many prefixes are only valid in 32-bit mode, and
> > > many others are only valid in 64-bit mode.
> 
> There is a problem with instruction length though.
> The buffer is 16 bytes long, but should be only 15.

OK.

> The 0xf0 (lock) prefix is not valid for popf or iret.

I think it is OK on really old processors (maybe only 386?)  If we fix
the above problem with faulting instructions then the fault this would
cause on newer CPUs should not be a problem.
-- 
Chuck


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
  2006-07-31  6:21 Chuck Ebbert
@ 2006-08-01  0:30 ` Albert Cahalan
  0 siblings, 0 replies; 15+ messages in thread
From: Albert Cahalan @ 2006-08-01  0:30 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Linus Torvalds, Alan Cox, Andi Kleen, Arjan van de Ven,
	Andrew Morton, Ingo Molnar, linux-kernel, Roland Dreier

On 7/31/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> On Thu, 27 Jul 2006 02:55:17 -0400, Albert Cahalan wrote:

> > There is also no check
> > for failure, as when the popf or iret takes an alignment exception
> > or hits an unmapped page.
>
> Can that happen?  Singlestep traps happen after the instruction has
> already executed.  Or are you talking about starting to singlestep
> after hitting a code breakpoint fault?

You're at a popf that can not complete.
You single-step.
The kernel sets TF.
The kernel notes the popf.
The kernel assumes that TF will be determined by the popf.
The kernel tries to run the popf.
The popf faults, leaving TF unmodified.
The kernel fails to clear TF.

> > There is the pushf problem. Single-stepping this simple code
> > does not work:   pushf ; popf
>
> The debugger needs to mask TF in the pushed flags.  Read the comment
> in is_at_popf().

I saw the comment. I don't consider that documentation.
Why even have single-step support if the debugger has to
mess with eflags manually anyway? I might as well just
exclusively use PTRACE_SYSCALL.

I think the term is "known bug".

> > The is_at_popf function on x86-64 fails to account for instruction
> > set differences. Many prefixes are only valid in 32-bit mode, and
> > many others are only valid in 64-bit mode.
>
> I only see one bug here: the REX prefixes are 'inc' instructions
> in compatibility mode.  Otherwise, prefixes that are only valid in
> 32-bit mode are ignored in 64-bit mode.

Oh, OK, I thought they faulted. (AMD botched this)

There is a problem with instruction length though.
The buffer is 16 bytes long, but should be only 15.

The 0xf0 (lock) prefix is not valid for popf or iret.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
@ 2006-07-31  6:21 Chuck Ebbert
  2006-08-01  0:30 ` Albert Cahalan
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Ebbert @ 2006-07-31  6:21 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Linus Torvalds, Alan Cox, Andi Kleen, Arjan van de Ven,
	Andrew Morton, Ingo Molnar, linux-kernel, Roland Dreier

In-Reply-To: <787b0d920607262355x3f669f0ap544e3166be2dca21@mail.gmail.com>

On Thu, 27 Jul 2006 02:55:17 -0400, Albert Cahalan wrote:

> Both i386 and x86-64 PTRACE_SINGLESTEP only check for popf, not iret.
> Yes, really, iret can be used by normal apps.

Well there's a FIXME in the x86_64 code for that, anyway. (lahf/sahf
can't cause problems, so iret is the only remaining problem.)

> There is also no check
> for failure, as when the popf or iret takes an alignment exception
> or hits an unmapped page.

Can that happen?  Singlestep traps happen after the instruction has
already executed.  Or are you talking about starting to singlestep
after hitting a code breakpoint fault?

> There is the pushf problem. Single-stepping this simple code
> does not work:   pushf ; popf

The debugger needs to mask TF in the pushed flags.  Read the comment
in is_at_popf().

> The is_at_popf function on x86-64 fails to account for instruction
> set differences. Many prefixes are only valid in 32-bit mode, and
> many others are only valid in 64-bit mode.

I only see one bug here: the REX prefixes are 'inc' instructions
in compatibility mode.  Otherwise, prefixes that are only valid in
32-bit mode are ignored in 64-bit mode.

> The debugger has no way to reliably stop a process without causing
> confusion. The SIGSTOP signal is not queued. The app under debug might
> use SIGSTOP and rely on SIGSTOP to work. The debugger can't steal this.

I sort of got this working some time ago but I forget what the
problems were.  The idea was to decide whether or not a SIGSTOP was
meant for the debugger or not, and forward the unwanted ones to the
app.  But yeah, the interface really sucks and that probably can't be
made to work.

What Linux needs is a fresh new design for a debugging interface to
sit on top if utrace, one that solves the current inherent problems.
Just making a list of these problems is probably the place to start.

-- 
Chuck


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ptrace bugs and related problems
@ 2006-07-28 20:07 Chuck Ebbert
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Ebbert @ 2006-07-28 20:07 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Albert Cahalan, linux-kernel

In-Reply-To: <20060728034741.GA3372@nevyn.them.org>

(cc: trimmed)

On Thu, 27 Jul 2006 23:47:41 -0400, Daniel Jacobowitz wrote:
> 
> > In do_fork, the result of fork_traceflag is assigned
> > to the "trace" variable. Note that PT_TRACE_VFORK_DONE
> > does not cause "trace" to be non-zero.
> > 
> > Then we hit this code:
> > 
> >                if (unlikely (trace)) {
> >                        current->ptrace_message = nr;
> >                        ptrace_notify ((trace << 8) | SIGTRAP);
> >                }
> > 
> > That doesn't run. The ptrace_message is thus not set when
> > ptrace_notify is called to send the PTRACE_EVENT_VFORK_DONE
> > message. You get random stale data from a previous message.
> 
> Why do you want the message data anyway?
> 
> FORK/VFORK/CLONE events have a message: it says what the new process's
> PID is.  VFORK_DONE doesn't have a message, because it only indicates
> that the current process is about to resume; it's an event that only
> has one process associated with it.
> 
> I really don't think this is a bug.

Maybe not a bug, but this would be a nice enhancement.  It would cost
exactly one line of code.  I looked at user code I had written and it
assumed the message was available (it was, because I was also tracing
EVENT_VFORK and it happens to be left over from that.)  If we make this
a part of the API, future kernel changes wouldn't break this (erroneous)
assumption, which otherwise might give someone a nasty surprise in
currently-working code.

Otherwise we should zero it out and see what breaks. :)

-- 
Chuck


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-08-01  5:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-27  6:55 ptrace bugs and related problems Albert Cahalan
2006-07-27  7:19 ` David Miller
2006-07-27 20:31 ` Daniel Jacobowitz
2006-07-28  1:17   ` Albert Cahalan
2006-07-28  3:47     ` Daniel Jacobowitz
2006-07-28 22:28       ` Albert Cahalan
2006-07-28 22:36         ` David Miller
2006-07-31 19:00         ` Daniel Jacobowitz
2006-08-01  0:08           ` Albert Cahalan
2006-08-01  1:37             ` Daniel Jacobowitz
2006-08-01  5:22               ` Albert Cahalan
2006-07-28 20:07 Chuck Ebbert
2006-07-31  6:21 Chuck Ebbert
2006-08-01  0:30 ` Albert Cahalan
2006-08-01  5:52 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).