Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
@ 2010-01-16 23:48 Jim Keniston
  2010-01-18  7:23 ` Peter Zijlstra
  2010-01-18 15:58 ` Masami Hiramatsu
  0 siblings, 2 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-16 23:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

Quoting Peter Zijlstra <peterz@infradead.org>:

> On Fri, 2010-01-15 at 16:58 -0800, Jim Keniston wrote:
>> But here are some things to keep in mind about the
>> various approaches:
>>
>> 1. Single-stepping inline is easiest: you need to know very little about
>> the instruction set you're probing.  But it's inadequate for
>> multithreaded apps.
>> 2. Single-stepping out of line solves the multithreading issue (as do #3
>> and #4), but requires more knowledge of the instruction set.  (In
>> particular, calls, jumps, and returns need special care; as do
>> rip-relative instructions in x86_64.)  I count 9 architectures that
>> support kprobes.  I think most of these do SSOL.
>> 3. "Boosted" probes (where an appended jump instruction removes the need
>> for the single-step trap on many instructions) require even more
>> knowledge of the instruction set, and like SSOL, require XOL slots.
>> Right now, as far as I know, x86 is the only architecture with boosted
>> kprobes.
>> 4. Emulation removes the need for the XOL area, but requires pretty much
>> total knowledge of the instruction set.  It's also a performance win for
>> architectures that can't do #3.  I see kvm implemented on 4
>> architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
>> architectures to which uprobes (old uprobes, with ubp and xol bundled
>> in) has already been ported (though Intel hasn't been maintaining their
>> ia64 port).
>
> Right, so I was thinking a combination of 4 and execute from kernel
> space would be feasible. I would think most regular instructions are
> runnable from kernel space given that we provide the proper pt_regs
> environment.
>
> Although I just realize we need to fully emulate the address computation
> step for all memory writes, otherwise a wild userspace pointer might end
> up writing in your kernel image.

Correct.

>
> Also, don't we already need full knowledge of the instruction set in
> order to decode the instruction stream and find instruction boundaries.

Not really.  For #3 (boosting), you need to know everything for #2,  
plus be able to compute the length of each instruction -- which we can  
now do for x86.  To emulate an instruction (#4), you need to replicate  
what it does, side-effects and all.  The x86 instruction set seems to  
be adding new floating-point instructions all the time, and I bet even  
Masami doesn't know what they all do, but so far, they all seem to  
adhere to the instruction-length rules encoded in Masami's instruction  
decoder.

As you may have noted before, I think FP would be a special problem  
for your approach.  I'm not sure how folks would react to the idea of  
executing FP instructions in kernel space.  But emulating them is also  
tough.  There's an IEEE FP emulation package somewhere in one of the  
Linux arch directories, but I'm not sure how precise it is, and  
dropping even 1 bit of precision is unacceptable for many  
applications, since such errors tend to grow in complex computations  
employing many FP instructions.

Jim

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-16 23:48 [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Jim Keniston
@ 2010-01-18  7:23 ` Peter Zijlstra
  2010-01-18 15:58 ` Masami Hiramatsu
  1 sibling, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18  7:23 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sat, 2010-01-16 at 18:48 -0500, Jim Keniston wrote:

> As you may have noted before, I think FP would be a special problem  
> for your approach.  I'm not sure how folks would react to the idea of  
> executing FP instructions in kernel space.  But emulating them is also  
> tough.  There's an IEEE FP emulation package somewhere in one of the  
> Linux arch directories, but I'm not sure how precise it is, and  
> dropping even 1 bit of precision is unacceptable for many  
> applications, since such errors tend to grow in complex computations  
> employing many FP instructions.

Well, we have kernel space using FP/MMX/SSE like things, its not hard if
you really need it, but in this case I think its easier than normal,
because we'll just allow it to change the userspace state because that
is exactly what we want it to do.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-16 23:48 [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Jim Keniston
  2010-01-18  7:23 ` Peter Zijlstra
@ 2010-01-18 15:58 ` Masami Hiramatsu
  2010-01-18 19:21   ` Jim Keniston
  1 sibling, 1 reply; 84+ messages in thread
From: Masami Hiramatsu @ 2010-01-18 15:58 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Maneesh Soni, Mark Wielaard,
	LKML

Jim Keniston wrote:
> Not really.  For #3 (boosting), you need to know everything for #2,  
> plus be able to compute the length of each instruction -- which we can  
> now do for x86.  To emulate an instruction (#4), you need to replicate  
> what it does, side-effects and all.  The x86 instruction set seems to  
> be adding new floating-point instructions all the time, and I bet even  
> Masami doesn't know what they all do, but so far, they all seem to  
> adhere to the instruction-length rules encoded in Masami's instruction  
> decoder.

Actually, current x86 decoder doesn't support FP(x87) instructions.(even
it already supported AVX) But I think it's not so hard to add it.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 15:58 ` Masami Hiramatsu
@ 2010-01-18 19:21   ` Jim Keniston
  2010-01-18 21:20     ` Masami Hiramatsu
  0 siblings, 1 reply; 84+ messages in thread
From: Jim Keniston @ 2010-01-18 19:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Maneesh Soni, Mark Wielaard,
	LKML

On Mon, 2010-01-18 at 10:58 -0500, Masami Hiramatsu wrote:
> Jim Keniston wrote:
> > Not really.  For #3 (boosting), you need to know everything for #2,  
> > plus be able to compute the length of each instruction -- which we can  
> > now do for x86.  To emulate an instruction (#4), you need to replicate  
> > what it does, side-effects and all.  The x86 instruction set seems to  
> > be adding new floating-point instructions all the time, and I bet even  
> > Masami doesn't know what they all do, but so far, they all seem to  
> > adhere to the instruction-length rules encoded in Masami's instruction  
> > decoder.
> 
> Actually, current x86 decoder doesn't support FP(x87) instructions.(even
> it already supported AVX) But I think it's not so hard to add it.
> 

At one point I verified that it worked for all the x87 instructions in
libm:
https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html
I'm pretty sure I tested mmx instructions as well.  But I guess this was
before you rearranged the opcode tables.

Yeah, it wouldn't be hard to add back in, at least for purposes of
computing instruction lengths.

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 19:21   ` Jim Keniston
@ 2010-01-18 21:20     ` Masami Hiramatsu
  0 siblings, 0 replies; 84+ messages in thread
From: Masami Hiramatsu @ 2010-01-18 21:20 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Maneesh Soni, Mark Wielaard,
	LKML

Jim Keniston wrote:
> On Mon, 2010-01-18 at 10:58 -0500, Masami Hiramatsu wrote:
>> Jim Keniston wrote:
>>> Not really.  For #3 (boosting), you need to know everything for #2,  
>>> plus be able to compute the length of each instruction -- which we can  
>>> now do for x86.  To emulate an instruction (#4), you need to replicate  
>>> what it does, side-effects and all.  The x86 instruction set seems to  
>>> be adding new floating-point instructions all the time, and I bet even  
>>> Masami doesn't know what they all do, but so far, they all seem to  
>>> adhere to the instruction-length rules encoded in Masami's instruction  
>>> decoder.
>>
>> Actually, current x86 decoder doesn't support FP(x87) instructions.(even
>> it already supported AVX) But I think it's not so hard to add it.
>>
> 
> At one point I verified that it worked for all the x87 instructions in
> libm:
> https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html
> I'm pretty sure I tested mmx instructions as well.  But I guess this was
> before you rearranged the opcode tables.
> 
> Yeah, it wouldn't be hard to add back in, at least for purposes of
> computing instruction lengths.

objdump -d /lib/libm.so.6  | awk -f arch/x86/tools/distill.awk | ./test_get_len 
Succeed: decoded and checked 37198 instructions

Hmm, yeah, that's already supported :-D.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-27 10:23                                                                   ` Ingo Molnar
@ 2010-02-07 13:47                                                                     ` Avi Kivity
  0 siblings, 0 replies; 84+ messages in thread
From: Avi Kivity @ 2010-02-07 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/27/2010 12:23 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>    

(back from vacation)

>>> If so then you ignore the obvious solution to _that_ problem: dont use
>>> INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks.
>>> It's _MUCH_ faster than _any_ breakpoint based solution - literally just
>>> the cost of a function call (or not even that - i've written very fast
>>> inlined tracers - they do rock when it comes to performance). Problem
>>> solved and none of the INT3 details matters at all.
>>>        
>> However did I not think of that?  Yes, and let's rip off kprobes tracing
>> from the kernel, we can always rebuild it.
>>
>> Well, I'm observing an issue in a production system now.  I may not want to
>> take it down, or if I take it down I may not be able to observe it again as
>> the problem takes a couple of days to show up, or I may not have the full
>> source, or it takes 10 minutes to build and so an iterative edit/build/run
>> cycle can stretch for hours.
>>      
> You have somewhat misconstrued my argument. What i said above is that _if_ you
> need extreme levels of performance you always have the option to go even
> faster via specialized tracing solutions. I did not promote it as a
> replacement solution. Specialization obviously brings in a new set of
> problems: infexibility and non-transparency, an example of what you gave
> above.
>
> Your proposed solution brings in precisely such kinds of issues, on a
> different level, just to improve performance at the cost of transparency and
> at the cost of features and robustness.
>    

We just disagree on the intrusiveness, then.  IMO it will be a very rare 
application that really suffers from a vma injection, since most apps 
don't manage their vmas directly but leave it to the kernel and ld.so.

> It's btw rather ironic as your arguments are somewhat similar to the Xen vs.
> KVM argument just turned around: KVM started out slower by relying on hardware
> implementation for virtualization while Xen relied on a clever but limiting
> hack. With each CPU generation the hardware got faster, while the various
> design limitations of Xen are hurting it and KVM is winning that race.
>
> A (partially) similar situation exists here: INT3 into ring 0 and handling it
> there in a protected environment might be more expensive, but _if_ it matters
> to performance it sure could be made faster in hardware (and in fact it will
> become faster with every new generation of hardware).
>    

Not at all.  For kvm hardware eliminates exits completely where pv Xen 
tries to reduce their cost, but an INT3 will be forever much more 
expensive than a jump.

You are right however that we should favour hardware support where 
available, and for high bandwidth tracing, it is available: branch trace 
store.  With that, it is easy to know how many times the processor 
passed through some code point as well as to reconstruct the entire call 
chain, basically what the function tracer does for the kernel.

Do we have facilities for exposing that to userspace?  It can also be 
very useful for the kernel.

It will still be slower if we only trace a few points, and it can't 
trace register and memory values, but it's a good tool to have IMO.

> Both Peter and me are telling you that we are considering your solution too
> specialized, at the cost of flexibility, features and robustness.
>    

We'll agree to disagree on that then.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-27  9:25                                                                 ` Avi Kivity
@ 2010-01-27 10:23                                                                   ` Ingo Molnar
  2010-02-07 13:47                                                                     ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Ingo Molnar @ 2010-01-27 10:23 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

* Avi Kivity <avi@redhat.com> wrote:

> > If so then you ignore the obvious solution to _that_ problem: dont use 
> > INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks. 
> > It's _MUCH_ faster than _any_ breakpoint based solution - literally just 
> > the cost of a function call (or not even that - i've written very fast 
> > inlined tracers - they do rock when it comes to performance). Problem 
> > solved and none of the INT3 details matters at all.
> 
> However did I not think of that?  Yes, and let's rip off kprobes tracing 
> from the kernel, we can always rebuild it.
> 
> Well, I'm observing an issue in a production system now.  I may not want to 
> take it down, or if I take it down I may not be able to observe it again as 
> the problem takes a couple of days to show up, or I may not have the full 
> source, or it takes 10 minutes to build and so an iterative edit/build/run 
> cycle can stretch for hours.

You have somewhat misconstrued my argument. What i said above is that _if_ you 
need extreme levels of performance you always have the option to go even 
faster via specialized tracing solutions. I did not promote it as a 
replacement solution. Specialization obviously brings in a new set of 
problems: infexibility and non-transparency, an example of what you gave 
above.

Your proposed solution brings in precisely such kinds of issues, on a 
different level, just to improve performance at the cost of transparency and 
at the cost of features and robustness.

It's btw rather ironic as your arguments are somewhat similar to the Xen vs. 
KVM argument just turned around: KVM started out slower by relying on hardware 
implementation for virtualization while Xen relied on a clever but limiting 
hack. With each CPU generation the hardware got faster, while the various 
design limitations of Xen are hurting it and KVM is winning that race.

A (partially) similar situation exists here: INT3 into ring 0 and handling it 
there in a protected environment might be more expensive, but _if_ it matters 
to performance it sure could be made faster in hardware (and in fact it will 
become faster with every new generation of hardware).

Both Peter and me are telling you that we are considering your solution too 
specialized, at the cost of flexibility, features and robustness.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-27  9:08                                                               ` Ingo Molnar
@ 2010-01-27  9:25                                                                 ` Avi Kivity
  2010-01-27 10:23                                                                   ` Ingo Molnar
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-27  9:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/27/2010 11:08 AM, Ingo Molnar wrote:
>
>> I see it exactly the opposite.  Only a very small minority of cases will
>> have such severe memory corruption that tracing will fall apart because of
>> random writes to memory; especially on 64-bit where the address space is
>> sparse.  On the other hand, knowing that the cost is a few dozen cycles
>> rather than a thousand or so means that you can trace production servers
>> running full loads without worrying about whether tracing will affect
>> whatever it is you're trying to observe.
>>
>> I'm not against slow reliable tracing, but we shouldn't ignore the need for
>> speed.
>>      
> I havent seen a conscise summary of your points in this thread, so let me
> summarize it as i've understood them (hopefully not putting words into your
> mouth): AFAICS you are arguing for some crazy fragile architecture-specific
> solution that traps INT3 into ring3 just to shave off a few cycles, and then
> use user-space state to trace into.
>    


That's a good summary, except for the words "crazy fragile", "trap INT3 
into ring3" and "a few cycles".

Instead of using int 3, put a jump instruction in the program.  This 
shaves a lot more than a few cycles.

> If so then you ignore the obvious solution to _that_ problem: dont use INT3 at
> all, but rebuild (or re-JIT) your program with explicit callbacks. It's _MUCH_
> faster than _any_ breakpoint based solution - literally just the cost of a
> function call (or not even that - i've written very fast inlined tracers -
> they do rock when it comes to performance). Problem solved and none of the
> INT3 details matters at all.
>    

However did I not think of that?  Yes, and let's rip off kprobes tracing 
from the kernel, we can always rebuild it.

Well, I'm observing an issue in a production system now.  I may not want 
to take it down, or if I take it down I may not be able to observe it 
again as the problem takes a couple of days to show up, or I may not 
have the full source, or it takes 10 minutes to build and so an 
iterative edit/build/run cycle can stretch for hours.

Adding a vma to a running program is very unlikely to affect it.  If the 
program makes random accesses to memory, it will likely segfault very 
quickly before we ever get to trace it.

> INT3 only matters to _transparent_ probing, and for that, the cost of INT3 is
> almost _by definition_ less important than the fact that we can do transparent
> tracing. If performance were the overriding issue they'd use dedicated
> callbacks - and the INT3 technique wouldnt matter at all.
>    

INT3 isn't transparent.  The only thing that comes close to full 
transparency is hardware breakpoints.  So we have a tradeoff between 
transparency and speed, and except for the wierdest bugs, this level of 
transparency won't be needed.

> ( Also, just like we were able to extend the kprobes code with more and more
>    optimizations, the same can be done with any user-space probing as well, to
>    make it faster. But at the core of it has to be a sane design that is
>    transparent and controlled by the kernel, so that it has the option to apply
>    more and more otimizations - yours isnt such and its limitations are
>    designed-in.

No design is fully transparent, and I don't see why my design can't be 
controlled by the kernel?

> Which is neither smart nor useful. )
>    

This style of arguing is neither smart or useful as well.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-27  8:35                                                             ` Avi Kivity
@ 2010-01-27  9:08                                                               ` Ingo Molnar
  2010-01-27  9:25                                                                 ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Ingo Molnar @ 2010-01-27  9:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

* Avi Kivity <avi@redhat.com> wrote:

> On 01/27/2010 10:24 AM, Ingo Molnar wrote:
> >
> >
> >>>Not to mention that that process could wreck the trace data rendering it
> >>>utterly unreliable.
> >>It could, but it also might not.  Are we going to deny high performance
> >>tracing to users just because it doesn't work in all cases?
> >Tracing and monitoring is foremost about being able to trust the instrument,
> >then about performance and usability. That's one of the big things about
> >ftrace and perf.
> >
> >By proposing 'user space tracing' you are missing two big aspects:
> >
> >  - That self-contained, kernel-driven tracing can be replicated in user-space.
> >    It cannot. Sharing and global state is much harder to maintain reliably,
> >    but the bigger problem is that user-space can stomp on its own tracing
> >    state and can make it unreliable. Tracing is often used to figure out bugs,
> >    and tracers will be trusted less if they can stomp on themselves.
> >
> >  - That somehow it's much faster and that this edge matters. It isnt and it
> >    doesnt matter. The few places that need very very fast tracing wont use any
> >    of these facilities - it will use something specialized.
> >
> >So you are creating a solution for special cases that dont need it, and you
> >are also ignoring prime qualities of a good tracing framework.
> 
> I see it exactly the opposite.  Only a very small minority of cases will 
> have such severe memory corruption that tracing will fall apart because of 
> random writes to memory; especially on 64-bit where the address space is 
> sparse.  On the other hand, knowing that the cost is a few dozen cycles 
> rather than a thousand or so means that you can trace production servers 
> running full loads without worrying about whether tracing will affect 
> whatever it is you're trying to observe.
> 
> I'm not against slow reliable tracing, but we shouldn't ignore the need for 
> speed.

I havent seen a conscise summary of your points in this thread, so let me 
summarize it as i've understood them (hopefully not putting words into your 
mouth): AFAICS you are arguing for some crazy fragile architecture-specific 
solution that traps INT3 into ring3 just to shave off a few cycles, and then 
use user-space state to trace into.

If so then you ignore the obvious solution to _that_ problem: dont use INT3 at 
all, but rebuild (or re-JIT) your program with explicit callbacks. It's _MUCH_ 
faster than _any_ breakpoint based solution - literally just the cost of a 
function call (or not even that - i've written very fast inlined tracers - 
they do rock when it comes to performance). Problem solved and none of the 
INT3 details matters at all.

INT3 only matters to _transparent_ probing, and for that, the cost of INT3 is 
almost _by definition_ less important than the fact that we can do transparent 
tracing. If performance were the overriding issue they'd use dedicated 
callbacks - and the INT3 technique wouldnt matter at all.

( Also, just like we were able to extend the kprobes code with more and more
  optimizations, the same can be done with any user-space probing as well, to 
  make it faster. But at the core of it has to be a sane design that is 
  transparent and controlled by the kernel, so that it has the option to apply 
  more and more otimizations - yours isnt such and its limitations are 
  designed-in. Which is neither smart nor useful. )

	Ingo

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-27  8:24                                                           ` Ingo Molnar
@ 2010-01-27  8:35                                                             ` Avi Kivity
  2010-01-27  9:08                                                               ` Ingo Molnar
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-27  8:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/27/2010 10:24 AM, Ingo Molnar wrote:
>
>
>>> Not to mention that that process could wreck the trace data rendering it
>>> utterly unreliable.
>>>        
>> It could, but it also might not.  Are we going to deny high performance
>> tracing to users just because it doesn't work in all cases?
>>      
> Tracing and monitoring is foremost about being able to trust the instrument,
> then about performance and usability. That's one of the big things about
> ftrace and perf.
>
> By proposing 'user space tracing' you are missing two big aspects:
>
>   - That self-contained, kernel-driven tracing can be replicated in user-space.
>     It cannot. Sharing and global state is much harder to maintain reliably,
>     but the bigger problem is that user-space can stomp on its own tracing
>     state and can make it unreliable. Tracing is often used to figure out bugs,
>     and tracers will be trusted less if they can stomp on themselves.
>
>   - That somehow it's much faster and that this edge matters. It isnt and it
>     doesnt matter. The few places that need very very fast tracing wont use any
>     of these facilities - it will use something specialized.
>
> So you are creating a solution for special cases that dont need it, and you
> are also ignoring prime qualities of a good tracing framework.
>    

I see it exactly the opposite.  Only a very small minority of cases will 
have such severe memory corruption that tracing will fall apart because 
of random writes to memory; especially on 64-bit where the address space 
is sparse.  On the other hand, knowing that the cost is a few dozen 
cycles rather than a thousand or so means that you can trace production 
servers running full loads without worrying about whether tracing will 
affect whatever it is you're trying to observe.

I'm not against slow reliable tracing, but we shouldn't ignore the need 
for speed.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20 12:22                                                         ` Avi Kivity
@ 2010-01-27  8:24                                                           ` Ingo Molnar
  2010-01-27  8:35                                                             ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Ingo Molnar @ 2010-01-27  8:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Jim Keniston, Pekka Enberg, Srikar Dronamraju,
	ananth, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML


* Avi Kivity <avi@redhat.com> wrote:

> On 01/20/2010 11:57 AM, Peter Zijlstra wrote:
> >On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote:
> >> 1. Write a trace entry into shared memory, trap into the kernel on 
> >>    overflow.
> >> 2. Trap if a condition is satisfied (fast watchpoint implementation).
> >
> > So now you want to consume more of a process' address space to store trace 
> > data as well?
> 
> Yes.  I know I'm bad.

No, you are just wrong.

> > Not to mention that that process could wreck the trace data rendering it 
> > utterly unreliable.
> 
> It could, but it also might not.  Are we going to deny high performance 
> tracing to users just because it doesn't work in all cases?

Tracing and monitoring is foremost about being able to trust the instrument, 
then about performance and usability. That's one of the big things about 
ftrace and perf.

By proposing 'user space tracing' you are missing two big aspects:

 - That self-contained, kernel-driven tracing can be replicated in user-space.
   It cannot. Sharing and global state is much harder to maintain reliably,
   but the bigger problem is that user-space can stomp on its own tracing
   state and can make it unreliable. Tracing is often used to figure out bugs,
   and tracers will be trusted less if they can stomp on themselves.

 - That somehow it's much faster and that this edge matters. It isnt and it
   doesnt matter. The few places that need very very fast tracing wont use any
   of these facilities - it will use something specialized.

So you are creating a solution for special cases that dont need it, and you 
are also ignoring prime qualities of a good tracing framework.

	Ingo

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20 19:58                                                         ` Andi Kleen
@ 2010-01-20 20:28                                                           ` Jim Keniston
  0 siblings, 0 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-20 20:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Avi Kivity, Pekka Enberg, Srikar Dronamraju, Peter Zijlstra,
	ananth, Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML


On Wed, 2010-01-20 at 20:58 +0100, Andi Kleen wrote:
> > Re: rewriting instructions that use rip-relative addressing.  We do that
> > now.  See handle_riprel_insn() in patch #2.  (As far as we can tell, it
> > works, but we'd appreciate your review of it.)
> 
> Yes, but how do you get within 2GB of it?

I'm not sure what you're asking.

To jump between the probed instruction stream and the XOL area, I've
proposed
  jmpq *(%rip)
  .quad next_insn
next_insn is a 64-bit address, which presumably allows you to jump to
anywhere in the address space.

To read/write the memory addressed by a rip-relative instruction, we
convert the rip-relative addressing to indirect addressing through a
64-bit scratch register (whose saved value we restore before returning
to the probed instruction stream).

> Add lots of holes
> in the address space? 

No.

> 
> > The instruction decoder is used only during instruction analysis, while
> > registering the probe -- i.e., in kernel space.
> 
> Registering the user probe? That means if there's a buffer overflow
> in there it would be exploitable.

Certainly a poorly written probe handler would be a problem.  Could you
explain further what you mean?  Are you talking about a buffer overflow
in the probed program?  in the probe handler?  in uprobes?

> 
> > > 
> > > In general the trend has been also to make traps faster in the CPU, make 
> > > sure you're not optimizing for some old CPU here.
> > 
> > I won't argue with that.  What Avi seems to be proposing buys us a
> > speedup, but at the cost of increased complexity -- among other things,
> > splitting the instrumentation code between user space (in the "XOL" area
> > -- which would then be used for much more than XOL instruction slots)
> 
> You can't have a single XOL area, at least not if you want to support
> shared libraries on 64bit & rip relative.

I disagree.  See above.

> 
> > and kernel space.  The splitting would presumably be handled by
> > higher-level code -- SystemTap, perf, or whatever.  It's a neat idea,
> > but it seems like a v2 kind of feature.
> 
> I'm not sure it can even work, unless you severly limited the allowed
> instructions.

I'm not sure it can work, either.  But I still believe that we've
addressed the known issues wrt the big x86_64 address space.

> 
> -Andi
> 

Thanks.
Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20 19:34                                                       ` Jim Keniston
@ 2010-01-20 19:58                                                         ` Andi Kleen
  2010-01-20 20:28                                                           ` Jim Keniston
  0 siblings, 1 reply; 84+ messages in thread
From: Andi Kleen @ 2010-01-20 19:58 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Andi Kleen, Avi Kivity, Pekka Enberg, Srikar Dronamraju,
	Peter Zijlstra, ananth, Ingo Molnar, Arnaldo Carvalho de Melo,
	utrace-devel, Frederic Weisbecker, Masami Hiramatsu,
	Maneesh Soni, Mark Wielaard, LKML

> Re: rewriting instructions that use rip-relative addressing.  We do that
> now.  See handle_riprel_insn() in patch #2.  (As far as we can tell, it
> works, but we'd appreciate your review of it.)

Yes, but how do you get within 2GB of it? Add lots of holes
in the address space? 

> The instruction decoder is used only during instruction analysis, while
> registering the probe -- i.e., in kernel space.

Registering the user probe? That means if there's a buffer overflow
in there it would be exploitable.

> > 
> > In general the trend has been also to make traps faster in the CPU, make 
> > sure you're not optimizing for some old CPU here.
> 
> I won't argue with that.  What Avi seems to be proposing buys us a
> speedup, but at the cost of increased complexity -- among other things,
> splitting the instrumentation code between user space (in the "XOL" area
> -- which would then be used for much more than XOL instruction slots)

You can't have a single XOL area, at least not if you want to support
shared libraries on 64bit & rip relative.

> and kernel space.  The splitting would presumably be handled by
> higher-level code -- SystemTap, perf, or whatever.  It's a neat idea,
> but it seems like a v2 kind of feature.

I'm not sure it can even work, unless you severly limited the allowed
instructions.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20 18:31                                                     ` Andi Kleen
@ 2010-01-20 19:34                                                       ` Jim Keniston
  2010-01-20 19:58                                                         ` Andi Kleen
  0 siblings, 1 reply; 84+ messages in thread
From: Jim Keniston @ 2010-01-20 19:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Avi Kivity, Pekka Enberg, Srikar Dronamraju, Peter Zijlstra,
	ananth, Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML


On Wed, 2010-01-20 at 19:31 +0100, Andi Kleen wrote:
> Jim Keniston <jkenisto@us.ibm.com> writes:
> >
> > I don't know of any such plans, but I'd be interested to read more of
> > your thoughts here.  As I understand it, you've suggested replacing the
> > probed instruction with a jump into an instrumentation vma (the XOL
> > area, or something similar).  Masami has demonstrated -- through his
> > djprobes enhancement to kprobes -- that this can be done for many x86
> > instructions.
> 
> The big problem when doing this in user space is that for 64bit
> it has to be within 2GB of the probed code, otherwise you would
> need to rewrite the instruction to not use any rip relative addressing,
> which can be rather complicated (needs registers, but the instruction
> might already use them, so you would need a register allocator/spilling etc.)

I'm probably telling you stuff you already know, but...

Re: jumps longer than 2GB: The following 14-byte sequence seems to work:
  jmpq *(%rip)
  .quad next_insn
where next_insn is the address of the instruction to which we want to
jump.  We'd need this for boosting, anyway -- to jump from the XOL area
back to the probed instruction stream.

I think djprobes inserts a 5-byte jump at the probepoint; I don't know
whether a 14-byte jump would introduce new difficulties.

Re: rewriting instructions that use rip-relative addressing.  We do that
now.  See handle_riprel_insn() in patch #2.  (As far as we can tell, it
works, but we'd appreciate your review of it.)

> 
> And that 2GB can be anywhere in the address space for shared
> libraries, which might well be already used. A lot of programs
> need large VM areas without holes.
> 
> Also I personally would be unconfortable to let the instruction
> decoder be used by unpriviledged code. Who knows how
> many buffer overflows it has?

The instruction decoder is used only during instruction analysis, while
registering the probe -- i.e., in kernel space.

> 
> In general the trend has been also to make traps faster in the CPU, make 
> sure you're not optimizing for some old CPU here.

I won't argue with that.  What Avi seems to be proposing buys us a
speedup, but at the cost of increased complexity -- among other things,
splitting the instrumentation code between user space (in the "XOL" area
-- which would then be used for much more than XOL instruction slots)
and kernel space.  The splitting would presumably be handled by
higher-level code -- SystemTap, perf, or whatever.  It's a neat idea,
but it seems like a v2 kind of feature.

> 
> -Andi

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19 18:06                                                     ` Frederic Weisbecker
  2010-01-20  6:36                                                       ` Srikar Dronamraju
@ 2010-01-20 19:31                                                       ` Masami Hiramatsu
  1 sibling, 0 replies; 84+ messages in thread
From: Masami Hiramatsu @ 2010-01-20 19:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jim Keniston, Avi Kivity, Pekka Enberg, Srikar Dronamraju,
	Peter Zijlstra, ananth, Ingo Molnar, Arnaldo Carvalho de Melo,
	utrace-devel, Maneesh Soni, Mark Wielaard, LKML

Frederic Weisbecker wrote:
> On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
>>> Do you have plans for a variant 
>>> that's completely in userspace?
>>
>> I don't know of any such plans, but I'd be interested to read more of
>> your thoughts here.  As I understand it, you've suggested replacing the
>> probed instruction with a jump into an instrumentation vma (the XOL
>> area, or something similar).  Masami has demonstrated -- through his
>> djprobes enhancement to kprobes -- that this can be done for many x86
>> instructions.
>>
>> What does the code in the jumped-to vma do?  Is the instrumentation code
>> that corresponds to the uprobe handlers encoded in an ad hoc .so?
> 
> 
> Once the instrumentation is requested by a process that is not the
> instrumented one, this looks impossible to set a uprobe without a
> minimal voluntary collaboration from the instrumented process
> (events sent through IPC or whatever). So that looks too limited,
> this is not anymore a true dynamic uprobe.

Agreed. Since uprobe's handler must be running in kernel,
we need to jump into kernel space anyway. "Booster" (just skips
a single-stepping(trap) exception) may be useful for
improving uprobe performance.

And also as Andi said, using jump instead of int3 in userspace
has 2GB address space limitation. It's not a problem for kernel
inside, but a big problem in userspace.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:14                                   ` Peter Zijlstra
  2010-01-18 12:37                                     ` Avi Kivity
@ 2010-01-20 18:32                                     ` Andi Kleen
  1 sibling, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2010-01-20 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

Peter Zijlstra <peterz@infradead.org> writes:
>
> With CPL2 or RPL on user segments the protection issue seems to be
> manageable for running the instructions from kernel space. 

Nope -- it doesn't work on 64bit and even on 32bit can have large
costs on some CPUs.

Also designing 32bit only features in 2010 would seem rather ....
unfortunate.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19 17:47                                                   ` Jim Keniston
  2010-01-19 18:06                                                     ` Frederic Weisbecker
  2010-01-20  9:43                                                     ` Avi Kivity
@ 2010-01-20 18:31                                                     ` Andi Kleen
  2010-01-20 19:34                                                       ` Jim Keniston
  2 siblings, 1 reply; 84+ messages in thread
From: Andi Kleen @ 2010-01-20 18:31 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Avi Kivity, Pekka Enberg, Srikar Dronamraju, Peter Zijlstra,
	ananth, Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

Jim Keniston <jkenisto@us.ibm.com> writes:
>
> I don't know of any such plans, but I'd be interested to read more of
> your thoughts here.  As I understand it, you've suggested replacing the
> probed instruction with a jump into an instrumentation vma (the XOL
> area, or something similar).  Masami has demonstrated -- through his
> djprobes enhancement to kprobes -- that this can be done for many x86
> instructions.

The big problem when doing this in user space is that for 64bit
it has to be within 2GB of the probed code, otherwise you would
need to rewrite the instruction to not use any rip relative addressing,
which can be rather complicated (needs registers, but the instruction
might already use them, so you would need a register allocator/spilling etc.)

And that 2GB can be anywhere in the address space for shared
libraries, which might well be already used. A lot of programs
need large VM areas without holes.

Also I personally would be unconfortable to let the instruction
decoder be used by unpriviledged code. Who knows how
many buffer overflows it has?

In general the trend has been also to make traps faster in the CPU, make 
sure you're not optimizing for some old CPU here.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 13:15                                       ` Peter Zijlstra
  2010-01-18 13:33                                         ` Avi Kivity
  2010-01-18 13:34                                         ` K.Prasad
@ 2010-01-20 15:57                                         ` Mel Gorman
  2 siblings, 0 replies; 84+ messages in thread
From: Mel Gorman @ 2010-01-20 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, Jan 18, 2010 at 02:15:51PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote:
> > On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
> > >
> > >> Well, the alternatives are very unappealing.  Emulation and
> > >> single-stepping are going to be very slow compared to a couple of jumps.
> > >>      
> > > With CPL2 or RPL on user segments the protection issue seems to be
> > > manageable for running the instructions from kernel space.
> > >    
> > 
> > CPL2 gives unrestricted access to the kernel address space; and RPL does 
> > not affect page level protection.  Segment limits don't work on x86-64.  
> > But perhaps I missed something - these things are tricky.
> 
> So setting RPL to 3 on the user segments allows access to kernel pages
> just fine? How useful.. :/
> 
> > It should be possible to translate the instruction into an address space 
> > check, followed by the action, but that's still slower due to privilege 
> > level switches.
> 
> Well, if you manage to do the address validation you don't need the priv
> level switch anymore, right?
> 

It also starts becoming very x86-centric though, doesn't it? It might
kick other ports later.

What is there at the moment is storing the copied instructions in a VMA.
The most unpalatable part of that to me is that it's visible to
userspace, probably via /proc/ and I didn't check, but I hope an
munmap() from userspace cannot delete it.

What the VMA has going for it is that it *appears* to be easier to port to
other architectures than the alternatives, certainly easier to handle than
instruction emulation.

> Are the ins encodings sane enough to recognize mem parameters without
> needing to know the actual ins?
> 
> How about using a hw-breakpoint to close the gap for the inline single
> step? You could even re-insert the int3 lazily when you need the
> hw-breakpoint again. It would consume one hw-breakpoint register for
> each task/cpu that has probes though..
> 

This feels very racy. Along with that, making these sort of changes
was considered a risky venture on x86 and needed strong verification from
elsewhere (http://lkml.org/lkml/2010/1/12/300). There are probably similar
concerns on other architectures that would make a reliable port difficult.

Right now the approach is with VMAs. The alternatives are

  1. reserved XOL page (similar disadvantages to the VMA)
  2. emulated instructions
	This is an emulation bug waiting to happen in my opinion and makes
	porting uprobes a significantly more difficult undertaking than
	either the XOL-VMA or XOL-page approach
  3. XOL page in kernel space available at a different CPL
	This assumes all target architectures have a usable privilege
	ring which may be the case. However, I would guess that it
	is going to perform worse than the current approach because
	of the change in privilege level. No idea what the cost of
	a privilege level change is, but I doubt it's free
  4. Boosted probes (arch-specific, apparently only x86 does this for
	kprobes)

As unpalatable as the VMA is, I am failing to see why it's not a
reasonable starting point with an understanding that 2 or 3 would be
implemented in the future after the other architecture ports are in
place and the reliability of the options as well as the performance can
be measured.

There would appear to be two classes of application that might suffer
from the VMA. The first which need absolutly every single ounce of address
space. The second which introspects itself via /proc/self/maps and makes
decisions based on that. The first is unfortunate but should be a limited
number of use cases. The second could be fudged by simply not exporting the
information via /proc.

I'm of the opinion it would be reasonable to let the VMA go ahead, look
at the ports for the other architectures and revisit options 2 and 3 above
to see if the VMA can really be removed with performance or reliability
penalty.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 15:01                   ` Peter Zijlstra
@ 2010-01-20 12:55                     ` Pavel Machek
  0 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2010-01-20 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sun 2010-01-17 16:01:46, Peter Zijlstra wrote:
> On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote:
> > On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
> 
> > > Also, if its fixed size you're imposing artificial limits on the number
> > > of possible probes.
> > >    
> > 
> > Obviously we'll need a limit, a uprobe will also take kernel memory, we 
> > can't allow people to exhaust it.
> 
> Only if its unprivilidged, kernel and root should be able to place as
> many probes until the machine keels over.

Well, it is address space that limits you in both cases...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20 10:45                                                       ` Srikar Dronamraju
@ 2010-01-20 12:23                                                         ` Avi Kivity
  0 siblings, 0 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-20 12:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Jim Keniston, Pekka Enberg, Peter Zijlstra, ananth, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/20/2010 12:45 PM, Srikar Dronamraju wrote:
>>> What does the code in the jumped-to vma do?
>>>        
>> 1. Write a trace entry into shared memory, trap into the kernel on overflow.
>> 2. Trap if a condition is satisfied (fast watchpoint implementation).
>>      
> That looks to be a nice idea. We should certainly look into this
> possibility. However can we look at this option probably a little later?
>
> Our plan was to do one step at a time i.e have the basic uprobes in
> first and target the booster (i.e jump to the next instruction without
> the need for single-stepping next).
>
> We could look at this option of using jump instead of int3 after we are
> done with the booster.  Hope that's okay.
>    

I'm all for incremental development and merging, as long as we keep the 
interfaces flexible enough for the future.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20  9:57                                                       ` Peter Zijlstra
@ 2010-01-20 12:22                                                         ` Avi Kivity
  2010-01-27  8:24                                                           ` Ingo Molnar
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-20 12:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jim Keniston, Pekka Enberg, Srikar Dronamraju, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/20/2010 11:57 AM, Peter Zijlstra wrote:
> On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote:
>    
>> 1. Write a trace entry into shared memory, trap into the kernel on overflow.
>> 2. Trap if a condition is satisfied (fast watchpoint implementation).
>>      
> So now you want to consume more of a process' address space to store
> trace data as well?

Yes.  I know I'm bad.

> Not to mention that that process could wreck the
> trace data rendering it utterly unreliable.
>    

It could, but it also might not.  Are we going to deny high performance 
tracing to users just because it doesn't work in all cases?

Note this applies to any kind of monitoring or debugging technology.  A 
process can be influenced by the debugger and render any debug info you 
get out of it unreliable.  One non-timing example is a process using a 
checksum of its text as an input to some algorithm.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20  6:36                                                       ` Srikar Dronamraju
@ 2010-01-20 10:51                                                         ` Frederic Weisbecker
  0 siblings, 0 replies; 84+ messages in thread
From: Frederic Weisbecker @ 2010-01-20 10:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Jim Keniston, Avi Kivity, Pekka Enberg, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Wed, Jan 20, 2010 at 12:06:20PM +0530, Srikar Dronamraju wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> [2010-01-19 19:06:12]:
> 
> > On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
> > > 
> > > What does the code in the jumped-to vma do?  Is the instrumentation code
> > > that corresponds to the uprobe handlers encoded in an ad hoc .so?
> > 
> > 
> > Once the instrumentation is requested by a process that is not the
> > instrumented one, this looks impossible to set a uprobe without a
> > minimal voluntary collaboration from the instrumented process
> > (events sent through IPC or whatever). So that looks too limited,
> > this is not anymore a true dynamic uprobe.
> 
> I dont see a case where the thread being debugged refuses to place a
> probe unless the process is exiting. The traced process doesnt decide
> if it wants to be probed or not. There could be a slight delay from the
> time the tracer requested to the time the probe is placed. But this
> delay in only affecting the tracer and the tracee. This is in contract
> to say stop_machine where the threads of other applications are also
> affected.


I did not think about a kind of trace point inserted in a shared memory.
I was just confused :)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20  9:43                                                     ` Avi Kivity
  2010-01-20  9:57                                                       ` Peter Zijlstra
@ 2010-01-20 10:45                                                       ` Srikar Dronamraju
  2010-01-20 12:23                                                         ` Avi Kivity
  1 sibling, 1 reply; 84+ messages in thread
From: Srikar Dronamraju @ 2010-01-20 10:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jim Keniston, Pekka Enberg, Peter Zijlstra, ananth, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

> >
> >What does the code in the jumped-to vma do?
> 
> 1. Write a trace entry into shared memory, trap into the kernel on overflow.
> 2. Trap if a condition is satisfied (fast watchpoint implementation).
> 
> >Is the instrumentation code
> >that corresponds to the uprobe handlers encoded in an ad hoc .so?
> 
> Looks like a good idea, but it doesn't matter much to me.
> 

That looks to be a nice idea. We should certainly look into this
possibility. However can we look at this option probably a little later?

Our plan was to do one step at a time i.e have the basic uprobes in
first and target the booster (i.e jump to the next instruction without
the need for single-stepping next). 

We could look at this option of using jump instead of int3 after we are
done with the booster.  Hope that's okay.

--
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-20  9:43                                                     ` Avi Kivity
@ 2010-01-20  9:57                                                       ` Peter Zijlstra
  2010-01-20 12:22                                                         ` Avi Kivity
  2010-01-20 10:45                                                       ` Srikar Dronamraju
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-20  9:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jim Keniston, Pekka Enberg, Srikar Dronamraju, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote:
> 1. Write a trace entry into shared memory, trap into the kernel on overflow.
> 2. Trap if a condition is satisfied (fast watchpoint implementation).

So now you want to consume more of a process' address space to store
trace data as well? Not to mention that that process could wreck the
trace data rendering it utterly unreliable.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19 17:47                                                   ` Jim Keniston
  2010-01-19 18:06                                                     ` Frederic Weisbecker
@ 2010-01-20  9:43                                                     ` Avi Kivity
  2010-01-20  9:57                                                       ` Peter Zijlstra
  2010-01-20 10:45                                                       ` Srikar Dronamraju
  2010-01-20 18:31                                                     ` Andi Kleen
  2 siblings, 2 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-20  9:43 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Pekka Enberg, Srikar Dronamraju, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/19/2010 07:47 PM, Jim Keniston wrote:
>
>> This is still with a kernel entry, yes?
>>      
> Yes, this involves setting a breakpoint and trapping into the kernel
> when it's hit.  The 6-7x figure is with the current 2-trap approach
> (breakpoint, single-step).  Boosting could presumably make that more
> like 12-14x.
>    

A trap is IIRC ~1000 cycles, we can reduce this to ~50 (totally 
negligible from the executed code's point of view).

>> Do you have plans for a variant
>> that's completely in userspace?
>>      
> I don't know of any such plans, but I'd be interested to read more of
> your thoughts here.  As I understand it, you've suggested replacing the
> probed instruction with a jump into an instrumentation vma (the XOL
> area, or something similar).  Masami has demonstrated -- through his
> djprobes enhancement to kprobes -- that this can be done for many x86
> instructions.
>
> What does the code in the jumped-to vma do?

1. Write a trace entry into shared memory, trap into the kernel on overflow.
2. Trap if a condition is satisfied (fast watchpoint implementation).

> Is the instrumentation code
> that corresponds to the uprobe handlers encoded in an ad hoc .so?
>    

Looks like a good idea, but it doesn't matter much to me.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19 18:06                                                     ` Frederic Weisbecker
@ 2010-01-20  6:36                                                       ` Srikar Dronamraju
  2010-01-20 10:51                                                         ` Frederic Weisbecker
  2010-01-20 19:31                                                       ` Masami Hiramatsu
  1 sibling, 1 reply; 84+ messages in thread
From: Srikar Dronamraju @ 2010-01-20  6:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jim Keniston, Avi Kivity, Pekka Enberg, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

* Frederic Weisbecker <fweisbec@gmail.com> [2010-01-19 19:06:12]:

> On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
> > 
> > What does the code in the jumped-to vma do?  Is the instrumentation code
> > that corresponds to the uprobe handlers encoded in an ad hoc .so?
> 
> 
> Once the instrumentation is requested by a process that is not the
> instrumented one, this looks impossible to set a uprobe without a
> minimal voluntary collaboration from the instrumented process
> (events sent through IPC or whatever). So that looks too limited,
> this is not anymore a true dynamic uprobe.

I dont see a case where the thread being debugged refuses to place a
probe unless the process is exiting. The traced process doesnt decide
if it wants to be probed or not. There could be a slight delay from the
time the tracer requested to the time the probe is placed. But this
delay in only affecting the tracer and the tracee. This is in contract
to say stop_machine where the threads of other applications are also
affected.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19 17:47                                                   ` Jim Keniston
@ 2010-01-19 18:06                                                     ` Frederic Weisbecker
  2010-01-20  6:36                                                       ` Srikar Dronamraju
  2010-01-20 19:31                                                       ` Masami Hiramatsu
  2010-01-20  9:43                                                     ` Avi Kivity
  2010-01-20 18:31                                                     ` Andi Kleen
  2 siblings, 2 replies; 84+ messages in thread
From: Frederic Weisbecker @ 2010-01-19 18:06 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Avi Kivity, Pekka Enberg, Srikar Dronamraju, Peter Zijlstra,
	ananth, Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
> > Do you have plans for a variant 
> > that's completely in userspace?
> 
> I don't know of any such plans, but I'd be interested to read more of
> your thoughts here.  As I understand it, you've suggested replacing the
> probed instruction with a jump into an instrumentation vma (the XOL
> area, or something similar).  Masami has demonstrated -- through his
> djprobes enhancement to kprobes -- that this can be done for many x86
> instructions.
> 
> What does the code in the jumped-to vma do?  Is the instrumentation code
> that corresponds to the uprobe handlers encoded in an ad hoc .so?


Once the instrumentation is requested by a process that is not the
instrumented one, this looks impossible to set a uprobe without a
minimal voluntary collaboration from the instrumented process
(events sent through IPC or whatever). So that looks too limited,
this is not anymore a true dynamic uprobe.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-19  8:07                                                 ` Avi Kivity
@ 2010-01-19 17:47                                                   ` Jim Keniston
  2010-01-19 18:06                                                     ` Frederic Weisbecker
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-19 17:47 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Srikar Dronamraju, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Tue, 2010-01-19 at 10:07 +0200, Avi Kivity wrote:
> On 01/19/2010 12:15 AM, Jim Keniston wrote:
> >
> >> I don't like the idea but if the performance benefits are real (are
> >> they?),
> >>      
> > Based on what seems to be the closest thing to an apples-to-apples
> > comparison -- counting the number of calls to a specified function --
> > uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c.
> > And of course, uprobes provides much, much more flexibility, appears to
> > scale better, and works with multithreaded apps.
> >
> > Likewise, FWIW, utrace is more than 10x faster than strace -c in
> > counting system calls.
> >
> >    
> 
> This is still with a kernel entry, yes?

Yes, this involves setting a breakpoint and trapping into the kernel
when it's hit.  The 6-7x figure is with the current 2-trap approach
(breakpoint, single-step).  Boosting could presumably make that more
like 12-14x.

> Do you have plans for a variant 
> that's completely in userspace?

I don't know of any such plans, but I'd be interested to read more of
your thoughts here.  As I understand it, you've suggested replacing the
probed instruction with a jump into an instrumentation vma (the XOL
area, or something similar).  Masami has demonstrated -- through his
djprobes enhancement to kprobes -- that this can be done for many x86
instructions.

What does the code in the jumped-to vma do?  Is the instrumentation code
that corresponds to the uprobe handlers encoded in an ad hoc .so?

BTW, when some people say "completely in userspace," they mean something
like ptrace, where the kernel is still heavily involved but the
instrumentation code runs in user space.  The ubp layer is intended to
support that model as well.  In our various implementations of the XOL
vma/address area, however, the XOL area is either created on exec or
created/expanded only by the probed process.

Jim

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 22:15                                               ` Jim Keniston
@ 2010-01-19  8:07                                                 ` Avi Kivity
  2010-01-19 17:47                                                   ` Jim Keniston
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-19  8:07 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Pekka Enberg, Srikar Dronamraju, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/19/2010 12:15 AM, Jim Keniston wrote:
>
>> I don't like the idea but if the performance benefits are real (are
>> they?),
>>      
> Based on what seems to be the closest thing to an apples-to-apples
> comparison -- counting the number of calls to a specified function --
> uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c.
> And of course, uprobes provides much, much more flexibility, appears to
> scale better, and works with multithreaded apps.
>
> Likewise, FWIW, utrace is more than 10x faster than strace -c in
> counting system calls.
>
>    

This is still with a kernel entry, yes?  Do you have plans for a variant 
that's completely in userspace?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:57                                             ` Pekka Enberg
  2010-01-18 13:06                                               ` Avi Kivity
@ 2010-01-18 22:15                                               ` Jim Keniston
  2010-01-19  8:07                                                 ` Avi Kivity
  1 sibling, 1 reply; 84+ messages in thread
From: Jim Keniston @ 2010-01-18 22:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Avi Kivity, Srikar Dronamraju, Peter Zijlstra, ananth,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:57 +0200, Pekka Enberg wrote:
> On 01/18/2010 02:51 PM, Pekka Enberg wrote:
> >> And how many probes do we expected to be live at the same time in
> >> real-world scenarios? I guess Avi's "one million" is more than enough?
> 
> Avi Kivity kirjoitti:
> > I don't think a user will ever come close to a million, but we can 
> > expect some inflation from inlined functions (I don't know if uprobes 
> > replicates such probes, but if it doesn't, it should).
> 
> Right. I guess we're looking at few megabytes of the address space for 
> normal scenarios which doesn't seem too excessive.
> 
> However, as Peter pointed out, the bigger problem is that now we're 
> opening the door for other features to steal chunks of the address 
> space. And I think it's a legitimate worry that it's going to cause 
> problems for 32-bit in the future.
> 
> I don't like the idea but if the performance benefits are real (are 
> they?),

Based on what seems to be the closest thing to an apples-to-apples
comparison -- counting the number of calls to a specified function --
uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c.
And of course, uprobes provides much, much more flexibility, appears to
scale better, and works with multithreaded apps.

Likewise, FWIW, utrace is more than 10x faster than strace -c in
counting system calls.

> maybe it's a worthwhile trade-off. Dunno.
> 
> 			Pekka

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 13:34                                             ` Mark Wielaard
@ 2010-01-18 19:49                                               ` Jim Keniston
  0 siblings, 0 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-18 19:49 UTC (permalink / raw)
  To: Mark Wielaard
  Cc: Avi Kivity, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Frederic Weisbecker, LKML, Pekka Enberg, utrace-devel

On Mon, 2010-01-18 at 14:34 +0100, Mark Wielaard wrote:
> On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote:
> > On 01/18/2010 02:51 PM, Pekka Enberg wrote:
> > >
> > > And how many probes do we expected to be live at the same time in
> > > real-world scenarios? I guess Avi's "one million" is more than enough?
> > >    
> > I don't think a user will ever come close to a million, but we can 
> > expect some inflation from inlined functions (I don't know if uprobes 
> > replicates such probes, but if it doesn't, it should).
> 
> SystemTap by default places probes on all instances of an inlined
> function. It is still hard to get to a million probes though.
> $ stap -v -l 'process("/usr/bin/emacs").function("*")'
> [...]
> Pass 2: analyzed script: 4359 probe(s)
> 
> You can try probing all statements (for every function, in every file,
> on every line of source code), but even that only adds up to ten
> thousands of probes:
> $ stap -v -l 'process("/usr/bin/emacs").statement("*@*:*")'
> [...]
> Pass 2: analyzed script: 39603 probe(s)
> 
> So a million is pretty far out, even if you add larger programs and all
> the shared libraries they are using.

Thanks, Mark.  One correction, below.

> 
> As Srikar said the current allocation technique is the simplest you can
> do, one xol slot for each uprobe. But there are other techniques that
> you can use. Theoretically you only need a xol slot for each thread of a
> process that simultaneously hits a uprobe instance. That requires a bit
> more bookkeeping. The variant of uprobes that systemtap uses at the
> moment does that.

Actually, it's per-probepoint, with a fixed number of slots.  If the
probepoint you just hit doesn't have a slot, and none are free, you
steal a slot from another probepoint.  Yeah, it's messy.

We considered allocating slots per-thread, hoping to make it basically
lockless, but that way there's more likely to be constant scribbling on
the XOL area, as a thread with n slots cycles through n+m probepoints.
And of course, it gets dicey as the process clones more threads.

I guess the point is, there are a lot of ways to allocate slots, and we
haven't found the perfect algorithm yet, even if you accept the
existence of (and need for) the XOL area.  Keep the ideas coming.

> But the locking in that case is pretty tricky, so it
> seemed easier to first get the code with the simplest xol allocation
> technique upstream. But if you do that than you can use a very small xol
> area to support millions of uprobes and only have to expand it when
> there are hundreds of threads in a process all hitting the probes
> simultaneously.
> 
> Cheers,
> 
> Mark
> 

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 16:52                                       ` Avi Kivity
@ 2010-01-18 17:10                                         ` Ananth N Mavinakayanahalli
  0 siblings, 0 replies; 84+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-01-18 17:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Peter Zijlstra, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, Jan 18, 2010 at 06:52:32PM +0200, Avi Kivity wrote:
> On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote:
>>>
>>>> Well, the alternatives are very unappealing.  Emulation and single-stepping
>>>> are going to be very slow compared to a couple of jumps.
>>>>        
>>> So how big chunks of the address space are we talking here for uprobes?
>>>      
>> As Srikar mentioned, the least we start with is 1 page. Though you can
>> have as many probes as you want, there are certain optimizations we can
>> do, depending on the most common usecases.
>>
>> For eg., if you'd consider the start of a routine to be the most
>> commonly traced location, most routines in a binary would generally
>> start with the same instruction (say push %ebp), and we can refcount a
>> slot with that instruction to be used for all probes of the same
>> instruction.
>>    
>
> But then you can't follow the instruction with a jump back to the code...

Right. This will work only for the non boosted case where single-stepping
is mandatory. I guess the tradeoff is vma space and speed.

Ananth

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 15:43                                     ` Ananth N Mavinakayanahalli
@ 2010-01-18 16:52                                       ` Avi Kivity
  2010-01-18 17:10                                         ` Ananth N Mavinakayanahalli
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 16:52 UTC (permalink / raw)
  To: ananth
  Cc: Pekka Enberg, Peter Zijlstra, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote:
>>
>>> Well, the alternatives are very unappealing.  Emulation and single-stepping
>>> are going to be very slow compared to a couple of jumps.
>>>        
>> So how big chunks of the address space are we talking here for uprobes?
>>      
> As Srikar mentioned, the least we start with is 1 page. Though you can
> have as many probes as you want, there are certain optimizations we can
> do, depending on the most common usecases.
>
> For eg., if you'd consider the start of a routine to be the most
> commonly traced location, most routines in a binary would generally
> start with the same instruction (say push %ebp), and we can refcount a
> slot with that instruction to be used for all probes of the same
> instruction.
>    

But then you can't follow the instruction with a jump back to the code...

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:13                                   ` Pekka Enberg
  2010-01-18 12:17                                     ` Avi Kivity
@ 2010-01-18 15:43                                     ` Ananth N Mavinakayanahalli
  2010-01-18 16:52                                       ` Avi Kivity
  1 sibling, 1 reply; 84+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-01-18 15:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Avi Kivity, Peter Zijlstra, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, Jan 18, 2010 at 02:13:25PM +0200, Pekka Enberg wrote:
> Hi Avi,
> 
> On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
> >>> Maybe you place no value on uprobes.  But people who debug userspace
> >>> likely will see a reason.
> 
> On 01/18/2010 02:06 PM, Peter Zijlstra wrote:
> >> I do see value in uprobes, I just don't like it mucking about with the
> >> address space. Nor does it appear required.
> 
> On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity <avi@redhat.com> wrote:
> > Well, the alternatives are very unappealing.  Emulation and single-stepping
> > are going to be very slow compared to a couple of jumps.
> 
> So how big chunks of the address space are we talking here for uprobes?

As Srikar mentioned, the least we start with is 1 page. Though you can
have as many probes as you want, there are certain optimizations we can
do, depending on the most common usecases.

For eg., if you'd consider the start of a routine to be the most
commonly traced location, most routines in a binary would generally
start with the same instruction (say push %ebp), and we can refcount a
slot with that instruction to be used for all probes of the same
instruction.

Ananth

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 13:15                                       ` Peter Zijlstra
  2010-01-18 13:33                                         ` Avi Kivity
@ 2010-01-18 13:34                                         ` K.Prasad
  2010-01-20 15:57                                         ` Mel Gorman
  2 siblings, 0 replies; 84+ messages in thread
From: K.Prasad @ 2010-01-18 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Arnaldo Carvalho de Melo, Frederic Weisbecker, LKML,
	Mark Wielaard, utrace-devel

On Mon, Jan 18, 2010 at 02:15:51PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote:
> > On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
> > >
> > >> Well, the alternatives are very unappealing.  Emulation and
> > >> single-stepping are going to be very slow compared to a couple of jumps.
> > >>      
> > > With CPL2 or RPL on user segments the protection issue seems to be
> > > manageable for running the instructions from kernel space.
> > >    
> > 
> > CPL2 gives unrestricted access to the kernel address space; and RPL does 
> > not affect page level protection.  Segment limits don't work on x86-64.  
> > But perhaps I missed something - these things are tricky.
> 
> So setting RPL to 3 on the user segments allows access to kernel pages
> just fine? How useful.. :/
> 
> > It should be possible to translate the instruction into an address space 
> > check, followed by the action, but that's still slower due to privilege 
> > level switches.
> 
> Well, if you manage to do the address validation you don't need the priv
> level switch anymore, right?
> 
> Are the ins encodings sane enough to recognize mem parameters without
> needing to know the actual ins?
> 
> How about using a hw-breakpoint to close the gap for the inline single
> step? You could even re-insert the int3 lazily when you need the
> hw-breakpoint again. It would consume one hw-breakpoint register for
> each task/cpu that has probes though..
>

A very scarce resource that it is, well, sometimes all that we might have
is just one hw-breakpoint register (like older PPC64 with 1 IABR) in the
system. If one process/thread consumes it, then all other contenders (from
both kernel and user-space) are prevented from acquiring it.

Also to mention the existence of processors with no support for
instruction breakpoints.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:53                                           ` Avi Kivity
  2010-01-18 12:57                                             ` Pekka Enberg
  2010-01-18 13:05                                             ` Peter Zijlstra
@ 2010-01-18 13:34                                             ` Mark Wielaard
  2010-01-18 19:49                                               ` Jim Keniston
  2 siblings, 1 reply; 84+ messages in thread
From: Mark Wielaard @ 2010-01-18 13:34 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Frederic Weisbecker, LKML, utrace-devel

On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote:
> On 01/18/2010 02:51 PM, Pekka Enberg wrote:
> >
> > And how many probes do we expected to be live at the same time in
> > real-world scenarios? I guess Avi's "one million" is more than enough?
> >    
> I don't think a user will ever come close to a million, but we can 
> expect some inflation from inlined functions (I don't know if uprobes 
> replicates such probes, but if it doesn't, it should).

SystemTap by default places probes on all instances of an inlined
function. It is still hard to get to a million probes though.
$ stap -v -l 'process("/usr/bin/emacs").function("*")'
[...]
Pass 2: analyzed script: 4359 probe(s)

You can try probing all statements (for every function, in every file,
on every line of source code), but even that only adds up to ten
thousands of probes:
$ stap -v -l 'process("/usr/bin/emacs").statement("*@*:*")'
[...]
Pass 2: analyzed script: 39603 probe(s)

So a million is pretty far out, even if you add larger programs and all
the shared libraries they are using.

As Srikar said the current allocation technique is the simplest you can
do, one xol slot for each uprobe. But there are other techniques that
you can use. Theoretically you only need a xol slot for each thread of a
process that simultaneously hits a uprobe instance. That requires a bit
more bookkeeping. The variant of uprobes that systemtap uses at the
moment does that. But the locking in that case is pretty tricky, so it
seemed easier to first get the code with the simplest xol allocation
technique upstream. But if you do that than you can use a very small xol
area to support millions of uprobes and only have to expand it when
there are hundreds of threads in a process all hitting the probes
simultaneously.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 13:15                                       ` Peter Zijlstra
@ 2010-01-18 13:33                                         ` Avi Kivity
  2010-01-18 13:34                                         ` K.Prasad
  2010-01-20 15:57                                         ` Mel Gorman
  2 siblings, 0 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/18/2010 03:15 PM, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote:
>    
>> On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
>>      
>>>        
>>>> Well, the alternatives are very unappealing.  Emulation and
>>>> single-stepping are going to be very slow compared to a couple of jumps.
>>>>
>>>>          
>>> With CPL2 or RPL on user segments the protection issue seems to be
>>> manageable for running the instructions from kernel space.
>>>
>>>        
>> CPL2 gives unrestricted access to the kernel address space; and RPL does
>> not affect page level protection.  Segment limits don't work on x86-64.
>> But perhaps I missed something - these things are tricky.
>>      
> So setting RPL to 3 on the user segments allows access to kernel pages
> just fine? How useful.. :/
>    

The further we stay away from segmentation, the better.  Thankfully AMD 
removed hardware task switching from x86-64 so we can't even think about 
that.

>> It should be possible to translate the instruction into an address space
>> check, followed by the action, but that's still slower due to privilege
>> level switches.
>>      
> Well, if you manage to do the address validation you don't need the priv
> level switch anymore, right?
>    

Right.

> Are the ins encodings sane enough to recognize mem parameters without
> needing to know the actual ins?
>    

No.  You need to know whether the instruction accesses memory or not.

Look at the tables at the beginning of arch/x86/kvm/emulate.c.  Opcodes 
marked with ModRM, BitOp, MemAbs, String, Stack are all different styles 
of memory instructions.  You need to know the operand size for the edge 
cases.  And there are probably a few special cases in the code.

> How about using a hw-breakpoint to close the gap for the inline single
> step? You could even re-insert the int3 lazily when you need the
> hw-breakpoint again. It would consume one hw-breakpoint register for
> each task/cpu that has probes though..
>    

If you have more than four threads, it breaks, no?  And you need an IPI 
each time you hit the breakpoint.

Ultimately I'd like to see the breakpoint avoided as well, use a jump to 
the XOL area and trace in ~20 cycles instead of ~1000.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:37                                     ` Avi Kivity
@ 2010-01-18 13:15                                       ` Peter Zijlstra
  2010-01-18 13:33                                         ` Avi Kivity
                                                           ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 13:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote:
> On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
> >
> >> Well, the alternatives are very unappealing.  Emulation and
> >> single-stepping are going to be very slow compared to a couple of jumps.
> >>      
> > With CPL2 or RPL on user segments the protection issue seems to be
> > manageable for running the instructions from kernel space.
> >    
> 
> CPL2 gives unrestricted access to the kernel address space; and RPL does 
> not affect page level protection.  Segment limits don't work on x86-64.  
> But perhaps I missed something - these things are tricky.

So setting RPL to 3 on the user segments allows access to kernel pages
just fine? How useful.. :/

> It should be possible to translate the instruction into an address space 
> check, followed by the action, but that's still slower due to privilege 
> level switches.

Well, if you manage to do the address validation you don't need the priv
level switch anymore, right?

Are the ins encodings sane enough to recognize mem parameters without
needing to know the actual ins?

How about using a hw-breakpoint to close the gap for the inline single
step? You could even re-insert the int3 lazily when you need the
hw-breakpoint again. It would consume one hw-breakpoint register for
each task/cpu that has probes though..


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:57                                             ` Pekka Enberg
@ 2010-01-18 13:06                                               ` Avi Kivity
  2010-01-18 22:15                                               ` Jim Keniston
  1 sibling, 0 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 13:06 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Srikar Dronamraju, Peter Zijlstra, ananth, Jim Keniston,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 02:57 PM, Pekka Enberg wrote:
> On 01/18/2010 02:51 PM, Pekka Enberg wrote:
>>> And how many probes do we expected to be live at the same time in
>>> real-world scenarios? I guess Avi's "one million" is more than enough?
>
> Avi Kivity kirjoitti:
>> I don't think a user will ever come close to a million, but we can 
>> expect some inflation from inlined functions (I don't know if uprobes 
>> replicates such probes, but if it doesn't, it should).
>
> Right. I guess we're looking at few megabytes of the address space for 
> normal scenarios which doesn't seem too excessive.
>
> However, as Peter pointed out, the bigger problem is that now we're 
> opening the door for other features to steal chunks of the address 
> space. And I think it's a legitimate worry that it's going to cause 
> problems for 32-bit in the future.
>
> I don't like the idea but if the performance benefits are real (are 
> they?), maybe it's a worthwhile trade-off. Dunno.

If uprobes can trace to buffer memory in the process address space, I 
think the win can be dramatic.  Incidentally it will require injecting 
even more vmas into a process.

Basically it means very low cost tracing, like the kernel tracers.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:53                                           ` Avi Kivity
  2010-01-18 12:57                                             ` Pekka Enberg
@ 2010-01-18 13:05                                             ` Peter Zijlstra
  2010-01-18 13:34                                             ` Mark Wielaard
  2 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 13:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Srikar Dronamraju, ananth, Jim Keniston,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote:
> On 01/18/2010 02:51 PM, Pekka Enberg wrote:
> >
> > And how many probes do we expected to be live at the same time in
> > real-world scenarios? I guess Avi's "one million" is more than enough?
> >    
> 
> I don't think a user will ever come close to a million, but we can 
> expect some inflation from inlined functions (I don't know if uprobes 
> replicates such probes, but if it doesn't, it should).

That's up to the userspace creating the probes but yes, agreed.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:53                                           ` Avi Kivity
@ 2010-01-18 12:57                                             ` Pekka Enberg
  2010-01-18 13:06                                               ` Avi Kivity
  2010-01-18 22:15                                               ` Jim Keniston
  2010-01-18 13:05                                             ` Peter Zijlstra
  2010-01-18 13:34                                             ` Mark Wielaard
  2 siblings, 2 replies; 84+ messages in thread
From: Pekka Enberg @ 2010-01-18 12:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Srikar Dronamraju, Peter Zijlstra, ananth, Jim Keniston,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 02:51 PM, Pekka Enberg wrote:
>> And how many probes do we expected to be live at the same time in
>> real-world scenarios? I guess Avi's "one million" is more than enough?

Avi Kivity kirjoitti:
> I don't think a user will ever come close to a million, but we can 
> expect some inflation from inlined functions (I don't know if uprobes 
> replicates such probes, but if it doesn't, it should).

Right. I guess we're looking at few megabytes of the address space for 
normal scenarios which doesn't seem too excessive.

However, as Peter pointed out, the bigger problem is that now we're 
opening the door for other features to steal chunks of the address 
space. And I think it's a legitimate worry that it's going to cause 
problems for 32-bit in the future.

I don't like the idea but if the performance benefits are real (are 
they?), maybe it's a worthwhile trade-off. Dunno.

			Pekka

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:51                                         ` Pekka Enberg
@ 2010-01-18 12:53                                           ` Avi Kivity
  2010-01-18 12:57                                             ` Pekka Enberg
                                                               ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 12:53 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Srikar Dronamraju, Peter Zijlstra, ananth, Jim Keniston,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 02:51 PM, Pekka Enberg wrote:
>
> And how many probes do we expected to be live at the same time in
> real-world scenarios? I guess Avi's "one million" is more than enough?
>    

I don't think a user will ever come close to a million, but we can 
expect some inflation from inlined functions (I don't know if uprobes 
replicates such probes, but if it doesn't, it should).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:44                                       ` Srikar Dronamraju
@ 2010-01-18 12:51                                         ` Pekka Enberg
  2010-01-18 12:53                                           ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Pekka Enberg @ 2010-01-18 12:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Avi Kivity, Peter Zijlstra, ananth, Jim Keniston, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, Jan 18, 2010 at 2:44 PM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
> * Avi Kivity <avi@redhat.com> [2010-01-18 14:17:10]:
>
>> On 01/18/2010 02:13 PM, Pekka Enberg wrote:
>> >So how big chunks of the address space are we talking here for uprobes?
>>
>> That's for the authors to answer, but at a guess, 32 bytes per probe
>> (largest x86 instruction is 15 bytes), so 32 MB will give you a
>> million probes.  That's a piece of cake for x86-64, probably harder
>> to justify for i386.
>
> On x86, each probe takes 16 bytes.

And how many probes do we expected to be live at the same time in
real-world scenarios? I guess Avi's "one million" is more than enough?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:17                                     ` Avi Kivity
  2010-01-18 12:24                                       ` Peter Zijlstra
  2010-01-18 12:24                                       ` Pekka Enberg
@ 2010-01-18 12:44                                       ` Srikar Dronamraju
  2010-01-18 12:51                                         ` Pekka Enberg
  2 siblings, 1 reply; 84+ messages in thread
From: Srikar Dronamraju @ 2010-01-18 12:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Peter Zijlstra, ananth, Jim Keniston, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

* Avi Kivity <avi@redhat.com> [2010-01-18 14:17:10]:

> On 01/18/2010 02:13 PM, Pekka Enberg wrote:
> >So how big chunks of the address space are we talking here for uprobes?
> 
> That's for the authors to answer, but at a guess, 32 bytes per probe
> (largest x86 instruction is 15 bytes), so 32 MB will give you a
> million probes.  That's a piece of cake for x86-64, probably harder
> to justify for i386.

On x86, each probe takes 16 bytes. 
In the current implementation of XOL, the first hit of a breakpoint,
requires us to allocate a page. If that page does get full with "active"
breakpoints, we expand / add a page. There is a bit map that keeps a
check to see if a previously used breakpoint is removed and hence that
slot can be reused.  By active breakpoints, I refer to those that are
inserted, and has been trapped atleast once but not yet removed.

Jim did try a few other allocation techniques but those that involved
slot stealing did end up having locking. People who did look at that
code did advise us to reduce the locking and keep the allocation simple
(atleast for the first cut).

--
Thanks and Regards
Srikar

> 
> -- 
> error compiling committee.c: too many arguments to function
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:14                                   ` Peter Zijlstra
@ 2010-01-18 12:37                                     ` Avi Kivity
  2010-01-18 13:15                                       ` Peter Zijlstra
  2010-01-20 18:32                                     ` Andi Kleen
  1 sibling, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 12:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/18/2010 02:14 PM, Peter Zijlstra wrote:
>
>> Well, the alternatives are very unappealing.  Emulation and
>> single-stepping are going to be very slow compared to a couple of jumps.
>>      
> With CPL2 or RPL on user segments the protection issue seems to be
> manageable for running the instructions from kernel space.
>    

CPL2 gives unrestricted access to the kernel address space; and RPL does 
not affect page level protection.  Segment limits don't work on x86-64.  
But perhaps I missed something - these things are tricky.

It should be possible to translate the instruction into an address space 
check, followed by the action, but that's still slower due to privilege 
level switches.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:17                                     ` Avi Kivity
  2010-01-18 12:24                                       ` Peter Zijlstra
@ 2010-01-18 12:24                                       ` Pekka Enberg
  2010-01-18 12:44                                       ` Srikar Dronamraju
  2 siblings, 0 replies; 84+ messages in thread
From: Pekka Enberg @ 2010-01-18 12:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, ananth, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 02:13 PM, Pekka Enberg wrote:
>> So how big chunks of the address space are we talking here for uprobes?

On Mon, Jan 18, 2010 at 2:17 PM, Avi Kivity <avi@redhat.com> wrote:
> That's for the authors to answer, but at a guess, 32 bytes per probe
> (largest x86 instruction is 15 bytes), so 32 MB will give you a million
> probes.  That's a piece of cake for x86-64, probably harder to justify for
> i386.

Yup, it's 32-bit that I worry about.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:17                                     ` Avi Kivity
@ 2010-01-18 12:24                                       ` Peter Zijlstra
  2010-01-18 12:24                                       ` Pekka Enberg
  2010-01-18 12:44                                       ` Srikar Dronamraju
  2 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 12:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, ananth, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:17 +0200, Avi Kivity wrote:
> On 01/18/2010 02:13 PM, Pekka Enberg wrote:
> > So how big chunks of the address space are we talking here for uprobes?
> >    
> 
> That's for the authors to answer, but at a guess, 32 bytes per probe 
> (largest x86 instruction is 15 bytes), so 32 MB will give you a million 
> probes.  That's a piece of cake for x86-64, probably harder to justify 
> for i386.

Yeah, I'm aware of people turning off address space randomization to
gain more virtual space on i386, I'm pretty sure those folks aren't
going to be happy if we shrink it.

Let alone them trying to probe their app.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:13                                   ` Pekka Enberg
@ 2010-01-18 12:17                                     ` Avi Kivity
  2010-01-18 12:24                                       ` Peter Zijlstra
                                                         ` (2 more replies)
  2010-01-18 15:43                                     ` Ananth N Mavinakayanahalli
  1 sibling, 3 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 12:17 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Peter Zijlstra, ananth, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On 01/18/2010 02:13 PM, Pekka Enberg wrote:
> So how big chunks of the address space are we talking here for uprobes?
>    

That's for the authors to answer, but at a guess, 32 bytes per probe 
(largest x86 instruction is 15 bytes), so 32 MB will give you a million 
probes.  That's a piece of cake for x86-64, probably harder to justify 
for i386.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:09                                 ` Avi Kivity
  2010-01-18 12:13                                   ` Pekka Enberg
@ 2010-01-18 12:14                                   ` Peter Zijlstra
  2010-01-18 12:37                                     ` Avi Kivity
  2010-01-20 18:32                                     ` Andi Kleen
  1 sibling, 2 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 12:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:09 +0200, Avi Kivity wrote:
> On 01/18/2010 02:06 PM, Peter Zijlstra wrote:
> > On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
> >    
> >> Maybe you place no value on uprobes.  But people who debug userspace
> >> likely will see a reason.
> >>      
> > I do see value in uprobes, I just don't like it mucking about with the
> > address space. Nor does it appear required.
> >    
> 
> Well, the alternatives are very unappealing.  Emulation and 
> single-stepping are going to be very slow compared to a couple of jumps.

With CPL2 or RPL on user segments the protection issue seems to be
manageable for running the instructions from kernel space. 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:09                                 ` Avi Kivity
@ 2010-01-18 12:13                                   ` Pekka Enberg
  2010-01-18 12:17                                     ` Avi Kivity
  2010-01-18 15:43                                     ` Ananth N Mavinakayanahalli
  2010-01-18 12:14                                   ` Peter Zijlstra
  1 sibling, 2 replies; 84+ messages in thread
From: Pekka Enberg @ 2010-01-18 12:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, ananth, Jim Keniston, Srikar Dronamraju,
	Ingo Molnar, Arnaldo Carvalho de Melo, utrace-devel,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

Hi Avi,

On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
>>> Maybe you place no value on uprobes.  But people who debug userspace
>>> likely will see a reason.

On 01/18/2010 02:06 PM, Peter Zijlstra wrote:
>> I do see value in uprobes, I just don't like it mucking about with the
>> address space. Nor does it appear required.

On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity <avi@redhat.com> wrote:
> Well, the alternatives are very unappealing.  Emulation and single-stepping
> are going to be very slow compared to a couple of jumps.

So how big chunks of the address space are we talking here for uprobes?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:06                               ` Peter Zijlstra
@ 2010-01-18 12:09                                 ` Avi Kivity
  2010-01-18 12:13                                   ` Pekka Enberg
  2010-01-18 12:14                                   ` Peter Zijlstra
  0 siblings, 2 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/18/2010 02:06 PM, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
>    
>> Maybe you place no value on uprobes.  But people who debug userspace
>> likely will see a reason.
>>      
> I do see value in uprobes, I just don't like it mucking about with the
> address space. Nor does it appear required.
>    

Well, the alternatives are very unappealing.  Emulation and 
single-stepping are going to be very slow compared to a couple of jumps.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 12:01                             ` Avi Kivity
@ 2010-01-18 12:06                               ` Peter Zijlstra
  2010-01-18 12:09                                 ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 12:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
> 
> Maybe you place no value on uprobes.  But people who debug userspace 
> likely will see a reason.

I do see value in uprobes, I just don't like it mucking about with the
address space. Nor does it appear required. 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 11:44                           ` Peter Zijlstra
@ 2010-01-18 12:01                             ` Avi Kivity
  2010-01-18 12:06                               ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 12:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/18/2010 01:44 PM, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote:
>    
>> You've made it clear that you don't like it, but not why.
>>
>> The kernel already manages the user's address space (except for
>> MAP_FIXED which is unreliable unless you've already reserved the address
>> space).  I don't see why adding a vma for debugging is so horrible.
>>      
> Well, the kernel only does what the user (and loader) tell it through
> mmap().

What I meant was that the kernel chooses the addresses (unless you go 
the MAP_FIXED way).  From the user's point of view, there is no change 
in behaviour: the kernel picks an address.  If the constraints have 
changed (because we reserve a range), that doesn't affect the user.

> Other than that we never (except this VDSO thing) inject vmas,
> and I see no reason to start doing that now.
>    

Maybe you place no value on uprobes.  But people who debug userspace 
likely will see a reason.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 11:01                         ` Avi Kivity
  2010-01-18 11:44                           ` Peter Zijlstra
@ 2010-01-18 11:45                           ` Peter Zijlstra
  1 sibling, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 11:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote:
> If we reserve some address space, you don't add any heisenbugs (at 
> least, not any additional ones over emulation).  Even if we don't, 
> address space layout randomization means we're not keeping the address 
> space layout constant between runs anyway. 

Well, it still limits the number of probes to the reserved area. If you
want more you need to grow the area.. which then changes the state.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18 11:01                         ` Avi Kivity
@ 2010-01-18 11:44                           ` Peter Zijlstra
  2010-01-18 12:01                             ` Avi Kivity
  2010-01-18 11:45                           ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18 11:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote:
> 
> You've made it clear that you don't like it, but not why.
> 
> The kernel already manages the user's address space (except for 
> MAP_FIXED which is unreliable unless you've already reserved the address 
> space).  I don't see why adding a vma for debugging is so horrible. 

Well, the kernel only does what the user (and loader) tell it through
mmap(). Other than that we never (except this VDSO thing) inject vmas,
and I see no reason to start doing that now.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-18  7:45                       ` Peter Zijlstra
@ 2010-01-18 11:01                         ` Avi Kivity
  2010-01-18 11:44                           ` Peter Zijlstra
  2010-01-18 11:45                           ` Peter Zijlstra
  0 siblings, 2 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-18 11:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/18/2010 09:45 AM, Peter Zijlstra wrote:
>
>> This is debugging.  We're playing with registers, we're playing with the
>> cpu, we're playing with memory contents.  Why not the address space as well?
>>      
> Because you want thins go to be as transparent as possible in order to
> avoid heisenbugs. Sure we cannot avoid everything, but we should avoid
> everything we possibly can.
>    

If we reserve some address space, you don't add any heisenbugs (at 
least, not any additional ones over emulation).  Even if we don't, 
address space layout randomization means we're not keeping the address 
space layout constant between runs anyway.

> Also, aside of the VDSO, we simply do not force map things into address
> spaces (and like said before, I think the VDSO stinks for doing that)
> and I think we don't want to create (more) precedents in this case.
>    

You've made it clear that you don't like it, but not why.

The kernel already manages the user's address space (except for 
MAP_FIXED which is unreliable unless you've already reserved the address 
space).  I don't see why adding a vma for debugging is so horrible.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 19:33                     ` Avi Kivity
@ 2010-01-18  7:45                       ` Peter Zijlstra
  2010-01-18 11:01                         ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18  7:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sun, 2010-01-17 at 21:33 +0200, Avi Kivity wrote:
> On 01/17/2010 05:03 PM, Peter Zijlstra wrote:
> >
> >> btw, an alternative is to require the caller to provide the address
> >> space for this.  If the caller is in another process, we need to allow
> >> it to play with the target's address space (i.e. mmap_process()).  I
> >> don't think uprobes justifies this by itself, but mmap_process() can be
> >> very useful for sandboxing with seccomp.
> >>      
> > mmap_process() sounds utterly gross, one process playing with another
> > process's address space.. yuck!
> >    
> 
> This is debugging.  We're playing with registers, we're playing with the 
> cpu, we're playing with memory contents.  Why not the address space as well?

Because you want thins go to be as transparent as possible in order to
avoid heisenbugs. Sure we cannot avoid everything, but we should avoid
everything we possibly can.

Also, aside of the VDSO, we simply do not force map things into address
spaces (and like said before, I think the VDSO stinks for doing that)
and I think we don't want to create (more) precedents in this case.




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17  0:12               ` Bryan Donlan
@ 2010-01-18  7:37                 ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-18  7:37 UTC (permalink / raw)
  To: Bryan Donlan
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Masami Hiramatsu,
	Maneesh Soni, Mark Wielaard, LKML

On Sat, 2010-01-16 at 19:12 -0500, Bryan Donlan wrote:
> On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston <jkenisto@us.ibm.com> wrote:
> 
> > 4. Emulation removes the need for the XOL area, but requires pretty much
> > total knowledge of the instruction set.  It's also a performance win for
> > architectures that can't do #3.  I see kvm implemented on 4
> > architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
> > architectures to which uprobes (old uprobes, with ubp and xol bundled
> > in) has already been ported (though Intel hasn't been maintaining their
> > ia64 port).  So it sort of comes down to how objectionable the XOL vma
> > (or page) really is.
> 
> On x86 at least, wouldn't one option to be to run the instruction to
> be emulated in CPL ('ring') 2, from a XOL page above the user-kernel
> split, not accessible to userspace at CPL 3? Linux hasn't
> traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on
> Xen), but it would seem to avoid many of the problems here - it's
> invisible to normal userspace code and so doesn't pollute userspace
> memory maps with kernel-private stuff, but since it's running at a
> higher CPL than the kernel, we can still protect kernel memory and
> protect against privileged instructions.

Another option is to go play games with the RPL of the user data
segments when we load them. But yeah, something like this seems to
nicely deal with the protection issues.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 15:03                   ` Peter Zijlstra
@ 2010-01-17 19:33                     ` Avi Kivity
  2010-01-18  7:45                       ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-17 19:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/17/2010 05:03 PM, Peter Zijlstra wrote:
>
>> btw, an alternative is to require the caller to provide the address
>> space for this.  If the caller is in another process, we need to allow
>> it to play with the target's address space (i.e. mmap_process()).  I
>> don't think uprobes justifies this by itself, but mmap_process() can be
>> very useful for sandboxing with seccomp.
>>      
> mmap_process() sounds utterly gross, one process playing with another
> process's address space.. yuck!
>    

This is debugging.  We're playing with registers, we're playing with the 
cpu, we're playing with memory contents.  Why not the address space as well?

For seccomp, this really should be generalized.  Run a system call on 
behalf of another process, but don't let that process do anything to 
affect it.  I think Google is doing something clever with one thread in 
seccomp mode and another unconstrained, but that's very hacky - you have 
to stop the constrained thread so it can't interfere with the live one.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 14:59                 ` Avi Kivity
@ 2010-01-17 15:03                   ` Peter Zijlstra
  2010-01-17 19:33                     ` Avi Kivity
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-17 15:03 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sun, 2010-01-17 at 16:59 +0200, Avi Kivity wrote:
> On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
> > On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
> >    
> >> On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
> >>      
> >>> As previously stated, I think poking at a process's address space is an
> >>> utter no-go.
> >>>
> >>>        
> >> Why not reserve an address space range for this, somewhere near the top
> >> of memory?  It doesn't have to be populated if it isn't used.
> >>      
> > Because I think poking at a process's address space like that is gross.
> > Also, if its fixed size you're imposing artificial limits on the number
> > of possible probes.
> >    
> 
> btw, an alternative is to require the caller to provide the address 
> space for this.  If the caller is in another process, we need to allow 
> it to play with the target's address space (i.e. mmap_process()).  I 
> don't think uprobes justifies this by itself, but mmap_process() can be 
> very useful for sandboxing with seccomp.

mmap_process() sounds utterly gross, one process playing with another
process's address space.. yuck!

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 14:56                 ` Avi Kivity
@ 2010-01-17 15:01                   ` Peter Zijlstra
  2010-01-20 12:55                     ` Pavel Machek
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-17 15:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote:
> On 01/17/2010 04:52 PM, Peter Zijlstra wrote:

> > Also, if its fixed size you're imposing artificial limits on the number
> > of possible probes.
> >    
> 
> Obviously we'll need a limit, a uprobe will also take kernel memory, we 
> can't allow people to exhaust it.

Only if its unprivilidged, kernel and root should be able to place as
many probes until the machine keels over.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 14:52               ` Peter Zijlstra
  2010-01-17 14:56                 ` Avi Kivity
@ 2010-01-17 14:59                 ` Avi Kivity
  2010-01-17 15:03                   ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-17 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
> On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
>    
>> On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
>>      
>>> As previously stated, I think poking at a process's address space is an
>>> utter no-go.
>>>
>>>        
>> Why not reserve an address space range for this, somewhere near the top
>> of memory?  It doesn't have to be populated if it isn't used.
>>      
> Because I think poking at a process's address space like that is gross.
> Also, if its fixed size you're imposing artificial limits on the number
> of possible probes.
>    

btw, an alternative is to require the caller to provide the address 
space for this.  If the caller is in another process, we need to allow 
it to play with the target's address space (i.e. mmap_process()).  I 
don't think uprobes justifies this by itself, but mmap_process() can be 
very useful for sandboxing with seccomp.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 14:52               ` Peter Zijlstra
@ 2010-01-17 14:56                 ` Avi Kivity
  2010-01-17 15:01                   ` Peter Zijlstra
  2010-01-17 14:59                 ` Avi Kivity
  1 sibling, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-17 14:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/17/2010 04:52 PM, Peter Zijlstra wrote:
> On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
>    
>> On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
>>      
>>> As previously stated, I think poking at a process's address space is an
>>> utter no-go.
>>>
>>>        
>> Why not reserve an address space range for this, somewhere near the top
>> of memory?  It doesn't have to be populated if it isn't used.
>>      
> Because I think poking at a process's address space like that is gross.
>    

If it's reserved, it's no longer the process' address space.

> Also, if its fixed size you're imposing artificial limits on the number
> of possible probes.
>    

Obviously we'll need a limit, a uprobe will also take kernel memory, we 
can't allow people to exhaust it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-17 14:39             ` Avi Kivity
@ 2010-01-17 14:52               ` Peter Zijlstra
  2010-01-17 14:56                 ` Avi Kivity
  2010-01-17 14:59                 ` Avi Kivity
  0 siblings, 2 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-17 14:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote:
> On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
> > As previously stated, I think poking at a process's address space is an
> > utter no-go.
> >    
> 
> Why not reserve an address space range for this, somewhere near the top 
> of memory?  It doesn't have to be populated if it isn't used.

Because I think poking at a process's address space like that is gross.
Also, if its fixed size you're imposing artificial limits on the number
of possible probes.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:50           ` Peter Zijlstra
  2010-01-15 10:10             ` Ananth N Mavinakayanahalli
  2010-01-15 21:19             ` Jim Keniston
@ 2010-01-17 14:39             ` Avi Kivity
  2010-01-17 14:52               ` Peter Zijlstra
  2 siblings, 1 reply; 84+ messages in thread
From: Avi Kivity @ 2010-01-17 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On 01/15/2010 11:50 AM, Peter Zijlstra wrote:
> As previously stated, I think poking at a process's address space is an
> utter no-go.
>    

Why not reserve an address space range for this, somewhere near the top 
of memory?  It doesn't have to be populated if it isn't used.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-16  0:58             ` Jim Keniston
  2010-01-16 10:33               ` Peter Zijlstra
  2010-01-17  0:12               ` Bryan Donlan
@ 2010-01-17 14:37               ` Avi Kivity
  2 siblings, 0 replies; 84+ messages in thread
From: Avi Kivity @ 2010-01-17 14:37 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Masami Hiramatsu,
	Maneesh Soni, Mark Wielaard, LKML

On 01/16/2010 02:58 AM, Jim Keniston wrote:
>
> I hear (er, read) you.  Emulation may turn out to be the answer for some
> architectures.  But here are some things to keep in mind about the
> various approaches:
>
> 1. Single-stepping inline is easiest: you need to know very little about
> the instruction set you're probing.  But it's inadequate for
> multithreaded apps.
> 2. Single-stepping out of line solves the multithreading issue (as do #3
> and #4), but requires more knowledge of the instruction set.  (In
> particular, calls, jumps, and returns need special care; as do
> rip-relative instructions in x86_64.)  I count 9 architectures that
> support kprobes.  I think most of these do SSOL.
> 3. "Boosted" probes (where an appended jump instruction removes the need
> for the single-step trap on many instructions) require even more
> knowledge of the instruction set, and like SSOL, require XOL slots.
> Right now, as far as I know, x86 is the only architecture with boosted
> kprobes.
> 4. Emulation removes the need for the XOL area, but requires pretty much
> total knowledge of the instruction set.  It's also a performance win for
> architectures that can't do #3.  I see kvm implemented on 4
> architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
> architectures to which uprobes (old uprobes, with ubp and xol bundled
> in) has already been ported (though Intel hasn't been maintaining their
> ia64 port).  So it sort of comes down to how objectionable the XOL vma
> (or page) really is.
>    

The kvm emulator emulates only a subset of the x86 instruction set 
(basically mmio instructions and commonly-used page-table manipulation 
instructions, as well as some privileged instructions).  It would take a 
lot of work to expand it to be completely generic; and even then it will 
fail if userspace uses an instruction set extension the kernel is not 
aware of.

To me, boosted probes with a fallback to single-stepping seems to be the 
better option by far.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-16  0:58             ` Jim Keniston
  2010-01-16 10:33               ` Peter Zijlstra
@ 2010-01-17  0:12               ` Bryan Donlan
  2010-01-18  7:37                 ` Peter Zijlstra
  2010-01-17 14:37               ` Avi Kivity
  2 siblings, 1 reply; 84+ messages in thread
From: Bryan Donlan @ 2010-01-17  0:12 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ananth N Mavinakayanahalli,
	utrace-devel, Frederic Weisbecker, Masami Hiramatsu,
	Maneesh Soni, Mark Wielaard, LKML

On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston <jkenisto@us.ibm.com> wrote:

> 4. Emulation removes the need for the XOL area, but requires pretty much
> total knowledge of the instruction set.  It's also a performance win for
> architectures that can't do #3.  I see kvm implemented on 4
> architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
> architectures to which uprobes (old uprobes, with ubp and xol bundled
> in) has already been ported (though Intel hasn't been maintaining their
> ia64 port).  So it sort of comes down to how objectionable the XOL vma
> (or page) really is.

On x86 at least, wouldn't one option to be to run the instruction to
be emulated in CPL ('ring') 2, from a XOL page above the user-kernel
split, not accessible to userspace at CPL 3? Linux hasn't
traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on
Xen), but it would seem to avoid many of the problems here - it's
invisible to normal userspace code and so doesn't pollute userspace
memory maps with kernel-private stuff, but since it's running at a
higher CPL than the kernel, we can still protect kernel memory and
protect against privileged instructions.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-16  0:58             ` Jim Keniston
@ 2010-01-16 10:33               ` Peter Zijlstra
  2010-01-17  0:12               ` Bryan Donlan
  2010-01-17 14:37               ` Avi Kivity
  2 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-16 10:33 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 16:58 -0800, Jim Keniston wrote:
> But here are some things to keep in mind about the
> various approaches:
> 
> 1. Single-stepping inline is easiest: you need to know very little about
> the instruction set you're probing.  But it's inadequate for
> multithreaded apps.
> 2. Single-stepping out of line solves the multithreading issue (as do #3
> and #4), but requires more knowledge of the instruction set.  (In
> particular, calls, jumps, and returns need special care; as do
> rip-relative instructions in x86_64.)  I count 9 architectures that
> support kprobes.  I think most of these do SSOL.
> 3. "Boosted" probes (where an appended jump instruction removes the need
> for the single-step trap on many instructions) require even more
> knowledge of the instruction set, and like SSOL, require XOL slots.
> Right now, as far as I know, x86 is the only architecture with boosted
> kprobes.
> 4. Emulation removes the need for the XOL area, but requires pretty much
> total knowledge of the instruction set.  It's also a performance win for
> architectures that can't do #3.  I see kvm implemented on 4
> architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
> architectures to which uprobes (old uprobes, with ubp and xol bundled
> in) has already been ported (though Intel hasn't been maintaining their
> ia64 port). 

Right, so I was thinking a combination of 4 and execute from kernel
space would be feasible. I would think most regular instructions are
runnable from kernel space given that we provide the proper pt_regs
environment.

Although I just realize we need to fully emulate the address computation
step for all memory writes, otherwise a wild userspace pointer might end
up writing in your kernel image.

Also, don't we already need full knowledge of the instruction set in
order to decode the instruction stream and find instruction boundaries.

> So it sort of comes down to how objectionable the XOL vma (or page)
> really is.

Well, I really hate touching the address space, and the fact that it
permutates the probed application in very obvious ways.

FWIW, I think the VDSO is ugly too and would have objected to it were it
proposed now -- there's much better solutions for that
(/sys/lib/libkernel.so comes to mind).

> Regarding your suggestion about executing the probed instruction in the
> kernel, how widely do you think that can be applied: which
> architectures?  how much of the instruction set?

I only know some of x86, I really couldn't tell for any other arch.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 21:49           ` Peter Zijlstra
@ 2010-01-16  0:58             ` Jim Keniston
  2010-01-16 10:33               ` Peter Zijlstra
                                 ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-16  0:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 22:49 +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 13:07 -0800, Jim Keniston wrote:
> > On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote:
> > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > > > 
> > > > +Instruction copies to be single-stepped are stored in a per-process
> > > > +"single-step out of line (XOL) area," which is a little VM area
> > > > +created by Uprobes in each probed process's address space.
> > > 
> > > I think tinkering with the probed process's address space is a no-no.
> > > Have you ran this by the linux mm folks?
> > 
> > Sort of.
> > 
> > Back in 2007 (!), we were getting ready to post uprobes (which was then
> > essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and
> > waiting for utrace to get pulled back into the -mm tree.  (It turned out
> > to be a long wait.)  I emailed Andrew Morton, inquiring about the
> > prospects for utrace and giving him a preview of utrace-based uprobes.
> > He expressed openness to the idea of allocating a piece of the user
> > address space for the XOL area, a la the vdso page.
> > 
> > With advice and review from Dave Hansen, we implemented an XOL page, set
> > up for every process (probed or not) along the same lines as the vdso
> > page.
> > 
> > About that time, Roland McGrath suggested using do_mmap_pgoff() to
> > create a separate vma on demand.  This was the seed of the current
> > implementation.  It had the advantages of being
> > architecture-independent, affecting only probed processes, and allowing
> > the allocation of more XOL slots.  (Uprobes can make do with a fixed
> > number of XOL slots -- allowing one probepoint to steal another's slot
> > -- but it isn't pretty.)
> > 
> > As I recall, Dave preferred the other idea (1 XOL page for every
> > process, probed or not) -- mostly because he didn't like the idea of a
> > new vma popping into existence when the process gets probed -- but was
> > OK with us going ahead with Roland's idea.
> 
> Well, I think its all very gross, I would really like people to try and
> 'emulate' or plain execute those original instructions from kernel
> space.
> 
> As to the privileged instructions, I think qemu/kvm like projects should
> have pretty much all of that covered.

I hear (er, read) you.  Emulation may turn out to be the answer for some
architectures.  But here are some things to keep in mind about the
various approaches:

1. Single-stepping inline is easiest: you need to know very little about
the instruction set you're probing.  But it's inadequate for
multithreaded apps.
2. Single-stepping out of line solves the multithreading issue (as do #3
and #4), but requires more knowledge of the instruction set.  (In
particular, calls, jumps, and returns need special care; as do
rip-relative instructions in x86_64.)  I count 9 architectures that
support kprobes.  I think most of these do SSOL.
3. "Boosted" probes (where an appended jump instruction removes the need
for the single-step trap on many instructions) require even more
knowledge of the instruction set, and like SSOL, require XOL slots.
Right now, as far as I know, x86 is the only architecture with boosted
kprobes.
4. Emulation removes the need for the XOL area, but requires pretty much
total knowledge of the instruction set.  It's also a performance win for
architectures that can't do #3.  I see kvm implemented on 4
architectures (ia64, powerpc, s390, x86).  Coincidentally, those are the
architectures to which uprobes (old uprobes, with ubp and xol bundled
in) has already been ported (though Intel hasn't been maintaining their
ia64 port).  So it sort of comes down to how objectionable the XOL vma
(or page) really is.

Regarding your suggestion about executing the probed instruction in the
kernel, how widely do you think that can be applied: which
architectures?  how much of the instruction set?

> 
> Nor do I think we need utrace at all to make user space probes useful.
> Even stronger, I think the focus on utrace made you get some
> fundamentals wrong. Its not mainly about task state, but like said, its
> about text mappings, which is something utrace knows nothing about.

I think that's a useful insight.  As mentioned, long ago we offered up a
version of uprobes where probes were per-executable rather than
per-process.  The feedback from LKML was, in no uncertain terms, that
they should be per-process, and use access_process_vm().  Of course --
as we then argued -- sometimes you want to probe a process from the very
start, so the SystemTap folks had to invent the task-finder to allow
that.

> 
> That is not to say you cannot build a useful interface from uprobes and
> utrace, but its not at all required or natural.
> 

Thanks again for your advice and ideas.

Jim

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 21:07         ` Jim Keniston
@ 2010-01-15 21:49           ` Peter Zijlstra
  2010-01-16  0:58             ` Jim Keniston
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15 21:49 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 13:07 -0800, Jim Keniston wrote:
> On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote:
> > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > > 
> > > +Instruction copies to be single-stepped are stored in a per-process
> > > +"single-step out of line (XOL) area," which is a little VM area
> > > +created by Uprobes in each probed process's address space.
> > 
> > I think tinkering with the probed process's address space is a no-no.
> > Have you ran this by the linux mm folks?
> 
> Sort of.
> 
> Back in 2007 (!), we were getting ready to post uprobes (which was then
> essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and
> waiting for utrace to get pulled back into the -mm tree.  (It turned out
> to be a long wait.)  I emailed Andrew Morton, inquiring about the
> prospects for utrace and giving him a preview of utrace-based uprobes.
> He expressed openness to the idea of allocating a piece of the user
> address space for the XOL area, a la the vdso page.
> 
> With advice and review from Dave Hansen, we implemented an XOL page, set
> up for every process (probed or not) along the same lines as the vdso
> page.
> 
> About that time, Roland McGrath suggested using do_mmap_pgoff() to
> create a separate vma on demand.  This was the seed of the current
> implementation.  It had the advantages of being
> architecture-independent, affecting only probed processes, and allowing
> the allocation of more XOL slots.  (Uprobes can make do with a fixed
> number of XOL slots -- allowing one probepoint to steal another's slot
> -- but it isn't pretty.)
> 
> As I recall, Dave preferred the other idea (1 XOL page for every
> process, probed or not) -- mostly because he didn't like the idea of a
> new vma popping into existence when the process gets probed -- but was
> OK with us going ahead with Roland's idea.

Well, I think its all very gross, I would really like people to try and
'emulate' or plain execute those original instructions from kernel
space.

As to the privileged instructions, I think qemu/kvm like projects should
have pretty much all of that covered.

Nor do I think we need utrace at all to make user space probes useful.
Even stronger, I think the focus on utrace made you get some
fundamentals wrong. Its not mainly about task state, but like said, its
about text mappings, which is something utrace knows nothing about.

That is not to say you cannot build a useful interface from uprobes and
utrace, but its not at all required or natural.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:50           ` Peter Zijlstra
  2010-01-15 10:10             ` Ananth N Mavinakayanahalli
@ 2010-01-15 21:19             ` Jim Keniston
  2010-01-17 14:39             ` Avi Kivity
  2 siblings, 0 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-15 21:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ananth, Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	utrace-devel, Frederic Weisbecker, Masami Hiramatsu,
	Maneesh Soni, Mark Wielaard, LKML


On Fri, 2010-01-15 at 10:50 +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote:
> > On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
> > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > > > 
> > > >  discussed elsewhere. 
> > > 
> > > Thanks for the pointer...
> > 
> > :-)
> > 
> > Peter,
> > I think Jim was referring to
> > http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html
> 
> That's a 2007 email from some obscure list... that's hardly something
> that can be referenced to without link.

Actually, I was referring to this
http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01120.html
from earlier (Monday) in this same discussion.  But I'll be sure to
include pointers in the future.

For more thoughts on this approach, see
http://sourceware.org/bugzilla/show_bug.cgi?id=5509
(And no, I don't expect you to have seen that before. :-))  Most of the
troublesome issues mentioned in that enhancement request have since been
resolved.

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:02       ` Peter Zijlstra
@ 2010-01-15 21:07         ` Jim Keniston
  2010-01-15 21:49           ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Jim Keniston @ 2010-01-15 21:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote:
> On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > 
> > +Instruction copies to be single-stepped are stored in a per-process
> > +"single-step out of line (XOL) area," which is a little VM area
> > +created by Uprobes in each probed process's address space.
> 
> I think tinkering with the probed process's address space is a no-no.
> Have you ran this by the linux mm folks?

Sort of.

Back in 2007 (!), we were getting ready to post uprobes (which was then
essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and
waiting for utrace to get pulled back into the -mm tree.  (It turned out
to be a long wait.)  I emailed Andrew Morton, inquiring about the
prospects for utrace and giving him a preview of utrace-based uprobes.
He expressed openness to the idea of allocating a piece of the user
address space for the XOL area, a la the vdso page.

With advice and review from Dave Hansen, we implemented an XOL page, set
up for every process (probed or not) along the same lines as the vdso
page.

About that time, Roland McGrath suggested using do_mmap_pgoff() to
create a separate vma on demand.  This was the seed of the current
implementation.  It had the advantages of being
architecture-independent, affecting only probed processes, and allowing
the allocation of more XOL slots.  (Uprobes can make do with a fixed
number of XOL slots -- allowing one probepoint to steal another's slot
-- but it isn't pretty.)

As I recall, Dave preferred the other idea (1 XOL page for every
process, probed or not) -- mostly because he didn't like the idea of a
new vma popping into existence when the process gets probed -- but was
OK with us going ahead with Roland's idea.

(I'm not a VM guy; pardon any imprecision in my language.)

Jim

> I'd be inclined to NAK this
> straight out.
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 10:56                   ` Peter Zijlstra
@ 2010-01-15 11:02                     ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15 11:02 UTC (permalink / raw)
  To: ananth
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 11:56 +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 15:52 +0530, Ananth N Mavinakayanahalli wrote:
> > On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote:
> > > On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:
> > > 
> > > > Ideas?
> > > 
> > > emulate the one instruction?
> > 
> > In kernel? Generically? Don't think its that easy for userspace --
> > you have the full gamut of instructions to emulate (fp, vector, etc);
> > further,
> 
> Can't you jit a piece of code that wraps the one instruction, save the
> full cpu state, set the userspace segments, have it load pt_regs (except
> for the IP) execute the one ins, save the results, restore the full
> state?

Hmm, normally the problem with FP/Vector state is that we don't
save/restore it going in/out the kernel, so kernel-space can't use it
because it would change the userspace state, but in this case we can
simply execute that one instruction and have it change user state,
because that's exactly what we want to do.

So we don't need to save restore the full cpu state around that JIT'ed
piece of code, but just the regular regs.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 10:22                 ` Ananth N Mavinakayanahalli
@ 2010-01-15 10:56                   ` Peter Zijlstra
  2010-01-15 11:02                     ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15 10:56 UTC (permalink / raw)
  To: ananth
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 15:52 +0530, Ananth N Mavinakayanahalli wrote:
> On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote:
> > On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:
> > 
> > > Ideas?
> > 
> > emulate the one instruction?
> 
> In kernel? Generically? Don't think its that easy for userspace --
> you have the full gamut of instructions to emulate (fp, vector, etc);
> further,

Can't you jit a piece of code that wraps the one instruction, save the
full cpu state, set the userspace segments, have it load pt_regs (except
for the IP) execute the one ins, save the results, restore the full
state?

Then replace pt_regs with the saved result and advance the stored IP by
the length of that one instruction and return to userspace?

All you need to take care of are the priv insns, but doesn't something
like kvm already have code to deal with that?

>  the instruction could itself cause a page fault and the like.

Faults aren't a problem, we take faults from kernel space all the time.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 10:13               ` Peter Zijlstra
@ 2010-01-15 10:22                 ` Ananth N Mavinakayanahalli
  2010-01-15 10:56                   ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-01-15 10:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:
> 
> > Ideas?
> 
> emulate the one instruction?

In kernel? Generically? Don't think its that easy for userspace --
you have the full gamut of instructions to emulate (fp, vector, etc);
further, the instruction could itself cause a page fault and the like.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15 10:10             ` Ananth N Mavinakayanahalli
@ 2010-01-15 10:13               ` Peter Zijlstra
  2010-01-15 10:22                 ` Ananth N Mavinakayanahalli
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15 10:13 UTC (permalink / raw)
  To: ananth
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:

> Ideas?

emulate the one instruction?


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:50           ` Peter Zijlstra
@ 2010-01-15 10:10             ` Ananth N Mavinakayanahalli
  2010-01-15 10:13               ` Peter Zijlstra
  2010-01-15 21:19             ` Jim Keniston
  2010-01-17 14:39             ` Avi Kivity
  2 siblings, 1 reply; 84+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-01-15 10:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, Jan 15, 2010 at 10:50:14AM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote:
> > On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
> > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > > > 
> > > >  discussed elsewhere. 
> > > 
> > > Thanks for the pointer...
> > 
> > :-)
> > 
> > Peter,
> > I think Jim was referring to
> > http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html
> 
> That's a 2007 email from some obscure list... that's hardly something
> that can be referenced to without link.
> 
> As previously stated, I think poking at a process's address space is an
> utter no-go.

In which case we'll need to find a different solution to it. The gdb
style of 'breakpoint hit' -> 'put original instruction back in place' ->
single-step -> 'put back the breakpoint' would be a big limiter,
especially for multithreaded cases.

The design here is to have a small vma sufficiently high enough in
memory a-la vDSO that most apps won't reach, though there is still no
ironclad guarantee.

Ideally, we will need to single-step on a copy of the instruction, in the
user address space of the traced process.

Ideas?

Ananth

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:38         ` Ananth N Mavinakayanahalli
@ 2010-01-15  9:50           ` Peter Zijlstra
  2010-01-15 10:10             ` Ananth N Mavinakayanahalli
                               ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15  9:50 UTC (permalink / raw)
  To: ananth
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote:
> On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
> > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > > 
> > >  discussed elsewhere. 
> > 
> > Thanks for the pointer...
> 
> :-)
> 
> Peter,
> I think Jim was referring to
> http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html

That's a 2007 email from some obscure list... that's hardly something
that can be referenced to without link.

As previously stated, I think poking at a process's address space is an
utter no-go.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-15  9:03       ` Peter Zijlstra
@ 2010-01-15  9:38         ` Ananth N Mavinakayanahalli
  2010-01-15  9:50           ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-01-15  9:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jim Keniston, Srikar Dronamraju, Ingo Molnar,
	Arnaldo Carvalho de Melo, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> > 
> >  discussed elsewhere. 
> 
> Thanks for the pointer...

:-)

Peter,
I think Jim was referring to
http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html

Ananth

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-14 19:46     ` Jim Keniston
  2010-01-15  9:02       ` Peter Zijlstra
@ 2010-01-15  9:03       ` Peter Zijlstra
  2010-01-15  9:38         ` Ananth N Mavinakayanahalli
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15  9:03 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> 
>  discussed elsewhere. 

Thanks for the pointer...


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-14 19:46     ` Jim Keniston
@ 2010-01-15  9:02       ` Peter Zijlstra
  2010-01-15 21:07         ` Jim Keniston
  2010-01-15  9:03       ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-15  9:02 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML

On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
> 
> +Instruction copies to be single-stepped are stored in a per-process
> +"single-step out of line (XOL) area," which is a little VM area
> +created by Uprobes in each probed process's address space.

I think tinkering with the probed process's address space is a no-no.
Have you ran this by the linux mm folks? I'd be inclined to NAK this
straight out.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-14 11:08   ` Peter Zijlstra
@ 2010-01-14 19:46     ` Jim Keniston
  2010-01-15  9:02       ` Peter Zijlstra
  2010-01-15  9:03       ` Peter Zijlstra
  0 siblings, 2 replies; 84+ messages in thread
From: Jim Keniston @ 2010-01-14 19:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Frederic Weisbecker,
	Masami Hiramatsu, Maneesh Soni, Mark Wielaard, LKML


On Thu, 2010-01-14 at 12:08 +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
> > User Space Breakpoint Assistance Layer (UBP)
> > 
> > User space breakpointing Infrastructure provides kernel subsystems
> > with architecture independent interface to establish breakpoints in
> > user applications. This patch provides core implementation of ubp and
> > also wrappers for architecture dependent methods.
> 
> So if this is the basic infrastructure to set userspace breakpoints,
> then why not call this uprobe?

Ubp is for setting and removing breakpoints, and for supporting the two
schemes (inline, out of line) for executing the probed instruction after
you hit the breakpoint.

Uprobes provides a higher-level API and deals with synchronization
issues, process-vs-thread issues, execution of the client's (potentially
buggy) probe handler, multiple probe clients, multiple probes at the
same location, thread- and process-lifetime events, etc.

> 
> > UBP currently supports both single stepping inline and execution out
> > of line strategies. Two different probepoints in the same process can
> > have two different strategies.
> 
> maybe explain wth these are?
> 

Here's a partial explanation from patch #6,section 1.1:

+When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
+user-mode registers are saved, and a SIGTRAP signal is generated.
+Uprobes intercepts the SIGTRAP and finds the associated uprobe.
+It then executes the handler associated with the uprobe, passing the
+handler the addresses of the uprobe struct and the saved registers.
+...
+
+Next, Uprobes single-steps its copy of the probed instruction and
+resumes execution of the probed process at the instruction following
+the probepoint.  (It would be simpler to single-step the actual
+instruction in place, but then Uprobes would have to temporarily
+remove the breakpoint instruction.  This would create problems in a
+multithreaded application.  For example, it would open a time window
+when another thread could sail right past the probepoint.)
+
+Instruction copies to be single-stepped are stored in a per-process
+"single-step out of line (XOL) area," which is a little VM area
+created by Uprobes in each probed process's address space.

This (single-stepping out of line = SSOL) is essentially what kprobes
does on most architectures.  XOL (execution out of line) is actually a
broader category that could include other schemes, discussed elsewhere.

Jim


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-11 12:25 ` [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Srikar Dronamraju
@ 2010-01-14 11:08   ` Peter Zijlstra
  2010-01-14 19:46     ` Jim Keniston
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2010-01-14 11:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo,
	Ananth N Mavinakayanahalli, utrace-devel, Jim Keniston,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote:
> User Space Breakpoint Assistance Layer (UBP)
> 
> User space breakpointing Infrastructure provides kernel subsystems
> with architecture independent interface to establish breakpoints in
> user applications. This patch provides core implementation of ubp and
> also wrappers for architecture dependent methods.

So if this is the basic infrastructure to set userspace breakpoints,
then why not call this uprobe?

> UBP currently supports both single stepping inline and execution out
> of line strategies. Two different probepoints in the same process can
> have two different strategies.

maybe explain wth these are?


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)
  2010-01-11 12:25 [RFC] [PATCH 0/7] UBP, XOL and Uprobes Srikar Dronamraju
@ 2010-01-11 12:25 ` Srikar Dronamraju
  2010-01-14 11:08   ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Srikar Dronamraju @ 2010-01-11 12:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srikar Dronamraju, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ananth N Mavinakayanahalli, utrace-devel, Jim Keniston,
	Frederic Weisbecker, Masami Hiramatsu, Maneesh Soni,
	Mark Wielaard, LKML

User Space Breakpoint Assistance Layer (UBP)

User space breakpointing Infrastructure provides kernel subsystems
with architecture independent interface to establish breakpoints in
user applications. This patch provides core implementation of ubp and
also wrappers for architecture dependent methods.

UBP currently supports both single stepping inline and execution out
of line strategies. Two different probepoints in the same process can
have two different strategies.

You need to follow this up with the UBP patch for your architecture.

Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/Kconfig        |   12 +
 include/linux/ubp.h |  282 ++++++++++++++++++++++++++++++
 kernel/Makefile     |    1 
 kernel/ubp_core.c   |  479 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 774 insertions(+)

Index: new_uprobes.git/arch/Kconfig
===================================================================
--- new_uprobes.git.orig/arch/Kconfig
+++ new_uprobes.git/arch/Kconfig
@@ -57,6 +57,15 @@ config KPROBES
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+config UBP
+	bool "User-space breakpoint assistance (EXPERIMENTAL)"
+	depends on MODULES
+	depends on HAVE_UBP
+	help
+	  Ubp enables kernel subsystems to establish breakpoints
+	  in user applications. This service is used by components
+	  such as uprobes. If in doubt, say "N".
+
 config HAVE_EFFICIENT_UNALIGNED_ACCESS
 	bool
 	help
@@ -90,6 +99,9 @@ config USER_RETURN_NOTIFIER
 	  Provide a kernel-internal notification when a cpu is about to
 	  switch to user mode.
 
+config HAVE_UBP
+	def_bool n
+
 config HAVE_IOREMAP_PROT
 	bool
 
Index: new_uprobes.git/include/linux/ubp.h
===================================================================
--- /dev/null
+++ new_uprobes.git/include/linux/ubp.h
@@ -0,0 +1,282 @@
+#ifndef _LINUX_UBP_H
+#define _LINUX_UBP_H
+/*
+ * User-space BreakPoint support (ubp)
+ * include/linux/ubp.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008, 2009
+ */
+
+#include <asm/ubp.h>
+struct task_struct;
+struct pt_regs;
+
+/**
+ * Strategy hints:
+ *
+ * %UBP_HNT_INLINE: Specifies that the instruction must
+ * be single-stepped inline.  Can be set by the caller of
+ * @arch->analyze_insn() -- e.g., if caller is out of XOL slots --
+ * or by @arch->analyze_insn() if there's no viable XOL strategy
+ * for that instruction.  Set in arch->strategies if the architecture
+ * doesn't implement XOL.
+ *
+ * %UBP_HNT_PERMSL: Specifies that the instruction slot whose
+ * address is @ubp->xol_vaddr is assigned to @ubp for the life of
+ * the process.  Can be used by @arch->analyze_insn() to simplify
+ * XOL in some cases.  Ignored in @arch->strategies.
+ *
+ * %UBP_HNT_TSKINFO: Set in @arch->strategies if the architecture's
+ * XOL handling requires the preservation of special
+ * task-specific info between the calls to @arch->pre_xol()
+ * and @arch->post_xol().  (E.g., XOL of x86_64 rip-relative
+ * instructions uses a scratch register, whose value is saved
+ * by pre_xol() and restored by post_xol().)  The caller
+ * of @arch->analyze_insn() should set %UBP_HNT_TSKINFO in
+ * @ubp->strategy if it's set in @arch->strategies and the caller
+ * can maintain a @ubp_task_arch_info object for each probed task.
+ * @arch->analyze_insn() should leave this flag set in @ubp->strategy
+ * if it needs to use the per-task @ubp_task_arch_info object.
+ */
+#define UBP_HNT_INLINE	0x1  /* Single-step this insn inline. */
+#define UBP_HNT_TSKINFO 0x2  /* XOL requires ubp_task_arch_info */
+#define UBP_HNT_PERMSL	0x4  /* XOL slot assignment is permanent */
+
+#define UBP_HNT_MASK	0x7
+
+/**
+ * struct ubp_bkpt - user-space breakpoint/probepoint
+ *
+ * @vaddr:	virtual address of probepoint
+ * @xol_vaddr:	virtual address of XOL slot assigned to this probepoint
+ * @opcode:	copy of opcode at @vaddr
+ * @insn:	typically a copy of the instruction at @vaddr.  More
+ *	precisely, this is the instruction (stream) that will be
+ *	executed in place of the original instruction.
+ * @strategy:	hints about how this instruction will be executed
+ * @fixups:	set of fixups to be executed by @arch->post_xol()
+ * @arch_info:	architecture-specific info about this probepoint
+ */
+struct ubp_bkpt {
+	unsigned long vaddr;
+	unsigned long xol_vaddr;
+	ubp_opcode_t opcode;
+	u8 insn[UBP_XOL_SLOT_BYTES];
+	u16 strategy;
+	u16 fixups;
+	struct ubp_bkpt_arch_info arch_info;
+};
+
+/* Post-execution fixups.  Some architectures may define others. */
+#define UPB_FIX_NONE	0x0  /* No fixup needed */
+#define UBP_FIX_IP	0x1  /* Adjust IP back to vicinity of actual insn */
+#define UBP_FIX_CALL	0x2  /* Adjust the return address of a call insn */
+
+#ifndef UPB_FIX_DEFAULT
+#define UPB_FIX_DEFAULT UBP_FIX_IP
+#endif
+
+#if defined(CONFIG_UBP)
+extern int ubp_init(u16 *strategies);
+extern int ubp_insert_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp);
+extern unsigned long ubp_get_bkpt_addr(struct pt_regs *regs);
+extern int ubp_pre_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp,
+		struct ubp_task_arch_info *tskinfo, struct pt_regs *regs);
+extern int ubp_post_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp,
+		struct ubp_task_arch_info *tskinfo, struct pt_regs *regs);
+extern int ubp_cancel_xol(struct task_struct *tsk, struct ubp_bkpt *ubp);
+extern int ubp_remove_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp);
+extern int ubp_validate_insn_addr(struct task_struct *tsk,
+						unsigned long vaddr);
+extern void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr);
+#else	/* CONFIG_UBP */
+static inline int ubp_init(u16 *strategies)
+{
+	return -ENOSYS;
+}
+static inline int ubp_insert_bkpt(struct task_struct *tsk,
+						struct ubp_bkpt *ubp)
+{
+	return -ENOSYS;
+}
+static inline unsigned long ubp_get_bkpt_addr(struct pt_regs *regs)
+{
+	return -ENOSYS;
+}
+static inline int ubp_pre_sstep(struct task_struct *tsk,
+	struct ubp_bkpt *ubp, struct ubp_task_arch_info *tskinfo,
+	struct pt_regs *regs)
+{
+	return -ENOSYS;
+}
+static inline int ubp_post_sstep(struct task_struct *tsk,
+	struct ubp_bkpt *ubp, struct ubp_task_arch_info *tskinfo,
+	struct pt_regs *regs)
+{
+	return -ENOSYS;
+}
+static inline int ubp_cancel_xol(struct task_struct *tsk,
+	struct ubp_bkpt *ubp)
+{
+	return -ENOSYS;
+}
+static inline int ubp_remove_bkpt(struct task_struct *tsk,
+	struct ubp_bkpt *ubp)
+{
+	return -ENOSYS;
+}
+static inline int ubp_validate_insn_addr(struct task_struct *tsk,
+	unsigned long vaddr)
+{
+	return -ENOSYS;
+}
+static inline void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr)
+{
+}
+#endif	/* CONFIG_UBP */
+
+#ifdef UBP_IMPLEMENTATION
+/**
+ * struct ubp_arch_info - architecture-specific parameters and functions
+ *
+ * Most architectures can use the default versions of @read_opcode(),
+ * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn(); ia64 is an
+ * exception.  All functions (including @validate_address()) can assume
+ * that the caller has verified that the probepoint's virtual address
+ * resides in an executable VM area.
+ *
+ * @bkpt_insn:
+ *	The architecture's breakpoint instruction.  This is used by
+ *	the default versions of @set_bkpt(), @set_orig_insn(), and
+ *	@is_bkpt_insn().
+ * @ip_advancement_by_bkpt_insn:
+ * 	The number of bytes the instruction pointer is advanced by
+ * 	this architecture's breakpoint instruction.  For example, after
+ * 	the powerpc trap instruction executes, the ip still points to the
+ * 	breakpoint instruction (ip_advancement_by_bkpt_insn = 0); but the
+ * 	x86 int3 instruction (1 byte) advances the ip past the int3
+ * 	(ip_advancement_by_bkpt_insn = 1).
+ * @max_insn_bytes:
+ *	The maximum length, in bytes, of an instruction in this
+ *	architecture.  This must be <= UBP_XOL_SLOT_BYTES;
+ * @strategies:
+ *	Bit-map of %UBP_HNT_* values recognized by this architecture.
+ *	Include %UBP_HNT_INLINE iff this architecture doesn't support
+ *	execution out of line.  Include %UBP_HNT_TSKINFO if
+ *	XOL of at least some instructions requires communication of
+ *	per-task state between @pre_xol() and @post_xol().
+ * @set_ip:
+ *	Set the instruction pointer in @regs to @vaddr.
+ * @validate_address:
+ *	Return 0 if @vaddr is a valid instruction address, or a negative
+ *	errno (typically -%EINVAL) otherwise.  If you don't provide
+ *	@validate_address(), any address will be accepted.  Caller
+ *	guarantees that @vaddr is in an executable VM area.  This
+ *	function typically just enforces arch-specific instruction
+ *	alignment.
+ * @read_opcode:
+ *	For task @tsk, read the opcode at @vaddr and store it in
+ *	@opcode.  Return 0 (success) or a negative errno.  Defaults to
+ *	@ubp_read_opcode().
+ * @set_bkpt:
+ *	For task @tsk, store @bkpt_insn at @ubp->vaddr.  Return 0
+ *	(success) or a negative errno. Defaults to @ubp_set_bkpt().
+ * @set_orig_insn:
+ *	For task @tsk, restore the original opcode (@ubp->opcode) at
+ *	@ubp->vaddr.  If @check is true, first verify that there's
+ *	actually a breakpoint instruction there.  Return 0 (success) or
+ *	a negative errno.  Defaults to @ubp_set_orig_insn().
+ * @is_bkpt_insn:
+ *	Return %true if @ubp->opcode is @bkpt_insn.  Defaults to
+ *	@ubp_is_bkpt_insn(), which just tests (ubp->opcode ==
+ *	arch->bkpt_insn).
+ * @analyze_insn:
+ *	Analyze @ubp->insn.  Return 0 if @ubp->insn is an instruction
+ *	you can probe, or a negative errno (typically -%EPERM)
+ *	otherwise.  The caller sets @ubp->strategy to %UBP_HNT_INLINE
+ *	to suppress XOL for this instruction (e.g., because we're
+ *	out of XOL slots).  If the instruction can be probed but
+ *	can't be executed out of line, set @ubp->strategy to
+ *	%UBP_HNT_INLINE.  Otherwise, determine what sort of XOL-related
+ *	fixups @post_xol() (and possibly @pre_xol()) will need
+ *	to do for this instruction, and annotate @ubp accordingly.
+ *	You may modify @ubp->insn (e.g., the x86_64 port does this
+ *	for rip-relative instructions), but if you do so, you should
+ *	retain a copy in @ubp->arch_info in case you have to revert
+ *	to single-stepping inline (see @cancel_xol()).
+ * @pre_xol:
+ *	Called just before executing the instruction associated
+ *	with @ubp out of line.  @ubp->xol_vaddr is the address in
+ *	@tsk's virtual address space where @ubp->insn has been copied.
+ *	@pre_xol() should at least set the instruction pointer in
+ *	@regs to @ubp->xol_vaddr -- which is what the default,
+ *	@ubp_pre_xol(), does.  If @ubp->strategy includes the
+ *	%UBP_HNT_TSKINFO flag, then @tskinfo points to a per-task
+ *	copy of struct ubp_task_arch_info.
+ * @post_xol:
+ *	Called after executing the instruction associated with
+ *	@ubp out of line.  @post_xol() should perform the fixups
+ *	specified in @ubp->fixups, which includes ensuring that the
+ *	instruction pointer in @regs points at the next instruction in
+ *	the probed instruction stream.  @tskinfo is as for @pre_xol().
+ *	You must provide this function.
+ * @cancel_xol:
+ *	The instruction associated with @ubp cannot be executed
+ *	out of line after all.  (This can happen when XOL slots
+ *	are lazily assigned, and we run out of slots before we
+ *	hit this breakpoint.  This function should never be called
+ *	if @analyze_insn() was previously called for @ubp with a
+ *	non-zero value of @ubp->xol_vaddr and with %UBP_HNT_PERMSL
+ *	set in @ubp->strategy.)  Adjust @ubp as needed so it can be
+ *	single-stepped inline.  Omit this function if you don't need it.
+ */
+
+struct ubp_arch_info {
+	ubp_opcode_t bkpt_insn;
+	u8 ip_advancement_by_bkpt_insn;
+	u8 max_insn_bytes;
+	u16 strategies;
+	void (*set_ip)(struct pt_regs *regs, unsigned long vaddr);
+	int (*validate_address)(struct task_struct *tsk, unsigned long vaddr);
+	int (*read_opcode)(struct task_struct *tsk, unsigned long vaddr,
+						ubp_opcode_t *opcode);
+	int (*set_bkpt)(struct task_struct *tsk, struct ubp_bkpt *ubp);
+	int (*set_orig_insn)(struct task_struct *tsk,
+				struct ubp_bkpt *ubp, bool check);
+	bool (*is_bkpt_insn)(struct ubp_bkpt *ubp);
+	int (*analyze_insn)(struct task_struct *tsk, struct ubp_bkpt *ubp);
+	int (*pre_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp,
+				struct ubp_task_arch_info *tskinfo,
+				struct pt_regs *regs);
+	int (*post_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp,
+				struct ubp_task_arch_info *tskinfo,
+				struct pt_regs *regs);
+	void (*cancel_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp);
+};
+
+/* Unexported functions & macros for use by arch-specific code */
+#define ubp_opcode_sz ((unsigned int)(sizeof(ubp_opcode_t)))
+extern int ubp_read_vm(struct task_struct *tsk, unsigned long vaddr,
+						void *kbuf, int nbytes);
+extern int ubp_write_data(struct task_struct *tsk, unsigned long vaddr,
+					const void *kbuf, int nbytes);
+
+extern struct ubp_arch_info ubp_arch_info;
+
+#endif	/* UBP_IMPLEMENTATION */
+
+#endif	/* _LINUX_UBP_H */
Index: new_uprobes.git/kernel/Makefile
===================================================================
--- new_uprobes.git.orig/kernel/Makefile
+++ new_uprobes.git/kernel/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_SLOW_WORK_DEBUG) += slow-wo
 obj-$(CONFIG_PERF_EVENTS) += perf_event.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
+obj-$(CONFIG_UBP) += ubp_core.o
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: new_uprobes.git/kernel/ubp_core.c
===================================================================
--- /dev/null
+++ new_uprobes.git/kernel/ubp_core.c
@@ -0,0 +1,479 @@
+/*
+ * User-space BreakPoint support (ubp)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008, 2009
+ */
+
+#define UBP_IMPLEMENTATION 1
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+#include <linux/mm.h>
+#include <linux/ubp.h>
+#include <linux/uaccess.h>
+
+/*
+ * TODO: Resolve verbosity.  ubp_insert_bkpt() is the only function
+ * that reports failures via printk.
+ */
+
+static struct ubp_arch_info *arch = &ubp_arch_info;
+
+static bool ubp_uses_xol(u16 strategy)
+{
+	return !(strategy & UBP_HNT_INLINE);
+}
+
+static bool validate_strategy(u16 strategy, u16 valid_bits)
+{
+	return ((strategy & (~valid_bits)) == 0);
+}
+
+/**
+ * ubp_init - initialize the ubp data structures
+ * @strategies indicates which breakpoint-related strategies are
+ * supported by the client:
+ *   %UBP_HNT_INLINE: Client supports only single-stepping inline.
+ *	Otherwise client must provide an instruction slot
+ *	(UBP_XOL_SLOT_BYTES bytes) in the probed process's address
+ *	space for each instruction to be executed out of line.
+ *   %UBP_HNT_TSKINFO: Client can provide and maintain one
+ *	@ubp_task_arch_info object for each probed task.  (Failure to
+ *	support this will prevent XOL of rip-relative instructions on
+ *	x86_64, at least.)
+ * Upon return, @strategies is updated to reflect those strategies
+ * required by this particular architecture's implementation of ubp:
+ *   %UBP_HNT_INLINE: Architecture or client supports only
+ *	single-stepping inline.
+ *   %UBP_HNT_TSKINFO: Architecture uses @ubp_task_arch_info, and will
+ *	expect it to be passed to @ubp_pre_sstep() and @ubp_post_sstep()
+ *	as needed (see @ubp_insert_bkpt()).
+ * Possible errors:
+ * -%ENOSYS: ubp not supported for this architecture.
+ * -%EINVAL: unrecognized flags in @strategies
+ */
+int ubp_init(u16 *strategies)
+{
+	u16 inline_bit, tskinfo_bit;
+	u16 client_strategies = *strategies;
+
+	if (!validate_strategy(client_strategies,
+				UBP_HNT_INLINE | UBP_HNT_TSKINFO))
+		return -EINVAL;
+
+	inline_bit = (client_strategies | arch->strategies) & UBP_HNT_INLINE;
+	tskinfo_bit = (client_strategies & arch->strategies) & UBP_HNT_TSKINFO;
+	*strategies = (inline_bit | tskinfo_bit);
+	return 0;
+}
+
+/*
+ * Read @nbytes at @vaddr from @tsk into @kbuf.  Return number of bytes read.
+ * Not exported, but available for use by arch-specific ubp code.
+ */
+int ubp_read_vm(struct task_struct *tsk, unsigned long vaddr,
+						void *kbuf, int nbytes)
+{
+	if (tsk == current) {
+		int nleft = copy_from_user(kbuf, (void __user *) vaddr,
+				nbytes);
+		return nbytes - nleft;
+	} else
+		return access_process_vm(tsk, vaddr, kbuf, nbytes, 0);
+}
+
+/*
+ * Write @nbytes from @kbuf at @vaddr in @tsk.  Return number of bytes written.
+ * Can be used to write to stack or data VM areas, but not instructions.
+ * Not exported, but available for use by arch-specific ubp code.
+ */
+int ubp_write_data(struct task_struct *tsk, unsigned long vaddr,
+						const void *kbuf, int nbytes)
+{
+	int nleft;
+
+	if (tsk == current) {
+		nleft = copy_to_user((void __user *) vaddr, kbuf, nbytes);
+		return nbytes - nleft;
+	} else
+		return access_process_vm(tsk, vaddr, (void *) kbuf,
+				nbytes, 1);
+}
+
+static int ubp_write_opcode(struct task_struct *tsk, unsigned long vaddr,
+							ubp_opcode_t opcode)
+{
+	int result;
+
+	result = access_process_vm(tsk, vaddr, &opcode, ubp_opcode_sz, 1);
+	return (result == ubp_opcode_sz ? 0 : -EFAULT);
+}
+
+/* Default implementation of arch->read_opcode */
+static int ubp_read_opcode(struct task_struct *tsk, unsigned long vaddr,
+							ubp_opcode_t *opcode)
+{
+	int bytes_read;
+
+	bytes_read = ubp_read_vm(tsk, vaddr, opcode, ubp_opcode_sz);
+	return (bytes_read == ubp_opcode_sz ? 0 : -EFAULT);
+}
+
+/* Default implementation of arch->set_bkpt */
+static int ubp_set_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp)
+{
+	return ubp_write_opcode(tsk, ubp->vaddr, arch->bkpt_insn);
+}
+
+/* Default implementation of arch->set_orig_insn */
+static int ubp_set_orig_insn(struct task_struct *tsk, struct ubp_bkpt *ubp,
+								bool check)
+{
+	if (check) {
+		ubp_opcode_t opcode;
+		int result = arch->read_opcode(tsk, ubp->vaddr, &opcode);
+		if (result)
+			return result;
+		if (opcode != arch->bkpt_insn)
+			return -EINVAL;
+	}
+	return ubp_write_opcode(tsk, ubp->vaddr, ubp->opcode);
+}
+
+/* Return 0 if vaddr is in an executable VM area, or -EINVAL otherwise. */
+static inline int ubp_check_vma(struct task_struct *tsk, unsigned long vaddr)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	int ret = -EINVAL;
+
+	mm = get_task_mm(tsk);
+	if (!mm)
+		return -EINVAL;
+	down_read(&mm->mmap_sem);
+	vma = find_vma(mm, vaddr);
+	if (vma && vaddr >= vma->vm_start && (vma->vm_flags & VM_EXEC))
+		ret = 0;
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+	return ret;
+}
+
+/**
+ * ubp_validate_insn_addr - Validate if the instruction is an
+ * executable vma.
+ * Returns 0 if the vaddr is a valid instruction address.
+ * @tsk: the probed task
+ * @vaddr: virtual address of the instruction to be verified.
+ *
+ * Possible errors:
+ * -%EINVAL: Instruction passed is not a valid instruction address.
+ */
+int ubp_validate_insn_addr(struct task_struct *tsk, unsigned long vaddr)
+{
+	int result;
+
+	result = ubp_check_vma(tsk, vaddr);
+	if (result != 0)
+		return result;
+	if (arch->validate_address)
+		result = arch->validate_address(tsk, vaddr);
+	return result;
+}
+
+static void ubp_bkpt_insertion_failed(struct task_struct *tsk,
+				struct ubp_bkpt *ubp, const char *why)
+{
+	printk(KERN_ERR "Can't place breakpoint at pid %d vaddr %#lx: %s\n",
+						tsk->pid, ubp->vaddr, why);
+}
+
+/**
+ * ubp_insert_bkpt - insert breakpoint
+ * Insert a breakpoint into the process that includes @tsk, at the
+ * virtual address @ubp->vaddr.
+ *
+ * @ubp->strategy affects how this breakpoint will be handled:
+ *   %UBP_HNT_INLINE: Probed instruction will be single-stepped inline.
+ *   %UBP_HNT_TSKINFO: As above.
+ *   %UBP_HNT_PERMSL: An XOL instruction slot in the probed process's
+ *	address space has been allocated to this probepoint, and will
+ *	remain so allocated as long as it's needed.  @ubp->xol_vaddr is
+ *	its address.  (This slot can be reallocated if
+ *	@ubp_insert_bkpt() fails.)  The client is NOT required to
+ *	allocate an instruction slot before calling @ubp_insert_bkpt().
+ * @ubp_insert_bkpt() updates @ubp->strategy as needed:
+ *   %UBP_HNT_INLINE: Architecture or client cannot do XOL for this
+ *	probepoint.
+ *   %UBP_HNT_TSKINFO: @ubp_task_arch_info will be used for this
+ *	probepoint.
+ *
+ * All threads of the probed process must be stopped while
+ * @ubp_insert_bkpt() runs.
+ *
+ * Possible errors:
+ * -%ENOSYS: ubp not supported for this architecture
+ * -%EINVAL: unrecognized/invalid strategy flags
+ * -%EINVAL: invalid instruction address
+ * -%EEXIST: breakpoint instruction already exists at that address
+ * -%EPERM: cannot probe this instruction
+ * -%EFAULT: failed to insert breakpoint instruction
+ * [TBD: Validate xol_vaddr?]
+ */
+int ubp_insert_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp)
+{
+	int result, len;
+
+	BUG_ON(!tsk || !ubp);
+	if (!validate_strategy(ubp->strategy, UBP_HNT_MASK))
+		return -EINVAL;
+
+	result = ubp_validate_insn_addr(tsk, ubp->vaddr);
+	if (result != 0)
+		return result;
+
+	/*
+	 * If ubp_read_vm() transfers fewer bytes than the maximum
+	 * instruction size, assume that the probed instruction is smaller
+	 * than the max and near the end of the last page of instructions.
+	 * But there must be room at least for a breakpoint-size instruction.
+	 */
+	len = ubp_read_vm(tsk, ubp->vaddr, ubp->insn, arch->max_insn_bytes);
+	if (len < ubp_opcode_sz) {
+		ubp_bkpt_insertion_failed(tsk, ubp,
+					"error reading original instruction");
+		return -EFAULT;
+	}
+	memcpy(&ubp->opcode, ubp->insn, ubp_opcode_sz);
+	if (arch->is_bkpt_insn(ubp)) {
+		ubp_bkpt_insertion_failed(tsk, ubp,
+					"bkpt already exists at that addr");
+		return -EEXIST;
+	}
+
+	result = arch->analyze_insn(tsk, ubp);
+	if (result < 0) {
+		ubp_bkpt_insertion_failed(tsk, ubp,
+					"instruction type cannot be probed");
+		return result;
+	}
+
+	result = arch->set_bkpt(tsk, ubp);
+	if (result < 0) {
+		ubp_bkpt_insertion_failed(tsk, ubp,
+					"failed to insert bkpt instruction");
+		return result;
+	}
+	return 0;
+}
+
+/**
+ * ubp_pre_sstep - prepare to single-step the probed instruction
+ * @tsk: the probed task
+ * @ubp: the probepoint information, as returned by @ubp_insert_bkpt().
+ *	Unless the %UBP_HNT_INLINE flag is set in @ubp->strategy,
+ *	@ubp->xol_vaddr must be the address of an XOL instruction slot
+ *	that is allocated to this probepoint at least until after the
+ *	completion of @ubp_post_sstep(), and populated with the contents
+ *	of @ubp->insn.  [Need to be more precise here to account for
+ *	untimely exit or UBP_HNT_BOOSTED.]
+ * @tskinfo: points to a @ubp_task_arch_info object for @tsk, if
+ *	the %UBP_HNT_TSKINFO flag is set in @ubp->strategy.
+ * @regs: reflects the saved user state of @tsk.  @ubp_pre_sstep()
+ *	adjusts this.  In particular, the instruction pointer is set
+ *	to the instruction to be single-stepped.
+ * Possible errors:
+ * -%EFAULT: Failed to read or write @tsk's address space as needed.
+ *
+ * The client must ensure that the contents of @ubp are not
+ * changed during the single-step operation -- i.e., between when
+ * @ubp_pre_sstep() is called and when @ubp_post_sstep() returns.
+ * Additionally, if single-stepping inline is used for this probepoint,
+ * the client must serialize the single-step operation (so multiple
+ * threads don't step on each other while the opcode replacement is
+ * taking place).
+ */
+int ubp_pre_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp,
+		struct ubp_task_arch_info *tskinfo, struct pt_regs *regs)
+{
+	int result;
+
+	BUG_ON(!tsk || !ubp || !regs);
+	if (ubp_uses_xol(ubp->strategy)) {
+		BUG_ON(!ubp->xol_vaddr);
+		return arch->pre_xol(tsk, ubp, tskinfo, regs);
+	}
+
+	/*
+	 * Single-step this instruction inline.  Replace the breakpoint
+	 * with the original opcode.
+	 */
+	result = arch->set_orig_insn(tsk, ubp, false);
+	if (result == 0)
+		arch->set_ip(regs, ubp->vaddr);
+	return result;
+}
+
+/**
+ * ubp_post_sstep - prepare to resume execution after single-step
+ * @tsk: the probed task
+ * @ubp: the probepoint information, as with @ubp_pre_sstep()
+ * @tskinfo: the @ubp_task_arch_info object, if any, passed to
+ *	@ubp_pre_sstep()
+ * @regs: reflects the saved state of @tsk after the single-step
+ *	operation.  @ubp_post_sstep() adjusts @tsk's state as needed,
+ *	including pointing the instruction pointer at the instruction
+ *	following the probed instruction.
+ * Possible errors:
+ * -%EFAULT: Failed to read or write @tsk's address space as needed.
+ */
+int ubp_post_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp,
+		struct ubp_task_arch_info *tskinfo, struct pt_regs *regs)
+{
+	BUG_ON(!tsk || !ubp || !regs);
+	if (ubp_uses_xol(ubp->strategy))
+		return arch->post_xol(tsk, ubp, tskinfo, regs);
+
+	/*
+	 * Single-stepped this instruction inline.  Put the breakpoint
+	 * instruction back.
+	 */
+	return arch->set_bkpt(tsk, ubp);
+}
+
+/**
+ * ubp_cancel_xol - cancel XOL for this probepoint
+ * @tsk: a task in the probed process
+ * @ubp: the probepoint information
+ * Switch @ubp's single-stepping strategy from out-of-line to inline.
+ * If the client employs lazy XOL-slot allocation, it can call
+ * this function if it determines that it can't provide an XOL
+ * slot for @ubp.  @ubp_cancel_xol() adjusts @ubp appropriately.
+ *
+ * @ubp_cancel_xol()'s behavior is undefined if @ubp_pre_sstep() has
+ * already been called for @ubp.
+ *
+ * Possible errors:
+ * Can't think of any yet.
+ */
+int ubp_cancel_xol(struct task_struct *tsk, struct ubp_bkpt *ubp)
+{
+	if (arch->cancel_xol)
+		arch->cancel_xol(tsk, ubp);
+	ubp->strategy |= UBP_HNT_INLINE;
+	return 0;
+}
+
+/**
+ * ubp_get_bkpt_addr - compute address of bkpt given post-bkpt regs
+ * @regs: Reflects the saved state of the task after it has hit a breakpoint
+ * instruction.  Return the address of the breakpoint instruction.
+ */
+unsigned long ubp_get_bkpt_addr(struct pt_regs *regs)
+{
+	return instruction_pointer(regs) - arch->ip_advancement_by_bkpt_insn;
+}
+
+/**
+ * ubp_remove_bkpt - remove breakpoint
+ * For the process that includes @tsk, remove the breakpoint specified
+ * by @ubp, restoring the original opcode.
+ *
+ * Possible errors:
+ * -%EINVAL: @ubp->vaddr is not a valid instruction address.
+ * -%ENOENT: There is no breakpoint instruction at @ubp->vaddr.
+ * -%EFAULT: Failed to read/write @tsk's address space as needed.
+ */
+int ubp_remove_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp)
+{
+	if (ubp_validate_insn_addr(tsk, ubp->vaddr) != 0)
+		return -EINVAL;
+	return arch->set_orig_insn(tsk, ubp, true);
+}
+
+void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr)
+{
+	arch->set_ip(regs, vaddr);
+}
+
+/* Default implementation of arch->is_bkpt_insn */
+static bool ubp_is_bkpt_insn(struct ubp_bkpt *ubp)
+{
+	return (ubp->opcode == arch->bkpt_insn);
+}
+
+/* Default implementation of arch->pre_xol */
+static int ubp_pre_xol(struct task_struct *tsk, struct ubp_bkpt *ubp,
+		struct ubp_task_arch_info *tskinfo, struct pt_regs *regs)
+{
+	arch->set_ip(regs, ubp->xol_vaddr);
+	return 0;
+}
+
+/* Validate arch-specific info during ubp initialization. */
+
+static int ubp_bad_arch_param(const char *param_name, int value)
+{
+	printk(KERN_ERR "ubp: bad value %d/%#x for parameter %s"
+		" in ubp_arch_info\n", value, value, param_name);
+	return -ENOSYS;
+}
+
+static int ubp_missing_arch_func(const char *func_name)
+{
+	printk(KERN_ERR "ubp: ubp_arch_info lacks required function: %s\n",
+								func_name);
+	return -ENOSYS;
+}
+
+static int __init init_ubp(void)
+{
+	int result = 0;
+
+	/* Accept any value of bkpt_insn. */
+	if (arch->max_insn_bytes < 1)
+		result = ubp_bad_arch_param("max_insn_bytes",
+						arch->max_insn_bytes);
+	if (arch->ip_advancement_by_bkpt_insn > arch->max_insn_bytes)
+		result = ubp_bad_arch_param("ip_advancement_by_bkpt_insn",
+					arch->ip_advancement_by_bkpt_insn);
+	/* Accept any value of strategies. */
+	if (!arch->set_ip)
+		result = ubp_missing_arch_func("set_ip");
+	/* Null validate_address() is OK. */
+	if (!arch->read_opcode)
+		arch->read_opcode = ubp_read_opcode;
+	if (!arch->set_bkpt)
+		arch->set_bkpt = ubp_set_bkpt;
+	if (!arch->set_orig_insn)
+		arch->set_orig_insn = ubp_set_orig_insn;
+	if (!arch->is_bkpt_insn)
+		arch->is_bkpt_insn = ubp_is_bkpt_insn;
+	if (!arch->analyze_insn)
+		result = ubp_missing_arch_func("analyze_insn");
+	if (!arch->pre_xol)
+		arch->pre_xol = ubp_pre_xol;
+	if (ubp_uses_xol(arch->strategies) && !arch->post_xol)
+		result = ubp_missing_arch_func("post_xol");
+	/* Null cancel_xol() is OK. */
+	return result;
+}
+
+module_init(init_ubp);

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2010-02-07 13:49 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-16 23:48 [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Jim Keniston
2010-01-18  7:23 ` Peter Zijlstra
2010-01-18 15:58 ` Masami Hiramatsu
2010-01-18 19:21   ` Jim Keniston
2010-01-18 21:20     ` Masami Hiramatsu
  -- strict thread matches above, loose matches on Subject: below --
2010-01-11 12:25 [RFC] [PATCH 0/7] UBP, XOL and Uprobes Srikar Dronamraju
2010-01-11 12:25 ` [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Srikar Dronamraju
2010-01-14 11:08   ` Peter Zijlstra
2010-01-14 19:46     ` Jim Keniston
2010-01-15  9:02       ` Peter Zijlstra
2010-01-15 21:07         ` Jim Keniston
2010-01-15 21:49           ` Peter Zijlstra
2010-01-16  0:58             ` Jim Keniston
2010-01-16 10:33               ` Peter Zijlstra
2010-01-17  0:12               ` Bryan Donlan
2010-01-18  7:37                 ` Peter Zijlstra
2010-01-17 14:37               ` Avi Kivity
2010-01-15  9:03       ` Peter Zijlstra
2010-01-15  9:38         ` Ananth N Mavinakayanahalli
2010-01-15  9:50           ` Peter Zijlstra
2010-01-15 10:10             ` Ananth N Mavinakayanahalli
2010-01-15 10:13               ` Peter Zijlstra
2010-01-15 10:22                 ` Ananth N Mavinakayanahalli
2010-01-15 10:56                   ` Peter Zijlstra
2010-01-15 11:02                     ` Peter Zijlstra
2010-01-15 21:19             ` Jim Keniston
2010-01-17 14:39             ` Avi Kivity
2010-01-17 14:52               ` Peter Zijlstra
2010-01-17 14:56                 ` Avi Kivity
2010-01-17 15:01                   ` Peter Zijlstra
2010-01-20 12:55                     ` Pavel Machek
2010-01-17 14:59                 ` Avi Kivity
2010-01-17 15:03                   ` Peter Zijlstra
2010-01-17 19:33                     ` Avi Kivity
2010-01-18  7:45                       ` Peter Zijlstra
2010-01-18 11:01                         ` Avi Kivity
2010-01-18 11:44                           ` Peter Zijlstra
2010-01-18 12:01                             ` Avi Kivity
2010-01-18 12:06                               ` Peter Zijlstra
2010-01-18 12:09                                 ` Avi Kivity
2010-01-18 12:13                                   ` Pekka Enberg
2010-01-18 12:17                                     ` Avi Kivity
2010-01-18 12:24                                       ` Peter Zijlstra
2010-01-18 12:24                                       ` Pekka Enberg
2010-01-18 12:44                                       ` Srikar Dronamraju
2010-01-18 12:51                                         ` Pekka Enberg
2010-01-18 12:53                                           ` Avi Kivity
2010-01-18 12:57                                             ` Pekka Enberg
2010-01-18 13:06                                               ` Avi Kivity
2010-01-18 22:15                                               ` Jim Keniston
2010-01-19  8:07                                                 ` Avi Kivity
2010-01-19 17:47                                                   ` Jim Keniston
2010-01-19 18:06                                                     ` Frederic Weisbecker
2010-01-20  6:36                                                       ` Srikar Dronamraju
2010-01-20 10:51                                                         ` Frederic Weisbecker
2010-01-20 19:31                                                       ` Masami Hiramatsu
2010-01-20  9:43                                                     ` Avi Kivity
2010-01-20  9:57                                                       ` Peter Zijlstra
2010-01-20 12:22                                                         ` Avi Kivity
2010-01-27  8:24                                                           ` Ingo Molnar
2010-01-27  8:35                                                             ` Avi Kivity
2010-01-27  9:08                                                               ` Ingo Molnar
2010-01-27  9:25                                                                 ` Avi Kivity
2010-01-27 10:23                                                                   ` Ingo Molnar
2010-02-07 13:47                                                                     ` Avi Kivity
2010-01-20 10:45                                                       ` Srikar Dronamraju
2010-01-20 12:23                                                         ` Avi Kivity
2010-01-20 18:31                                                     ` Andi Kleen
2010-01-20 19:34                                                       ` Jim Keniston
2010-01-20 19:58                                                         ` Andi Kleen
2010-01-20 20:28                                                           ` Jim Keniston
2010-01-18 13:05                                             ` Peter Zijlstra
2010-01-18 13:34                                             ` Mark Wielaard
2010-01-18 19:49                                               ` Jim Keniston
2010-01-18 15:43                                     ` Ananth N Mavinakayanahalli
2010-01-18 16:52                                       ` Avi Kivity
2010-01-18 17:10                                         ` Ananth N Mavinakayanahalli
2010-01-18 12:14                                   ` Peter Zijlstra
2010-01-18 12:37                                     ` Avi Kivity
2010-01-18 13:15                                       ` Peter Zijlstra
2010-01-18 13:33                                         ` Avi Kivity
2010-01-18 13:34                                         ` K.Prasad
2010-01-20 15:57                                         ` Mel Gorman
2010-01-20 18:32                                     ` Andi Kleen
2010-01-18 11:45                           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).