Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
       [not found] <20070207025008.1B11118005D@magilla.sf.frob.com>
@ 2007-02-07 19:22 ` Alan Stern
  2007-02-07 22:08   ` Bob Copeland
  2007-02-09 10:21   ` Roland McGrath
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-02-07 19:22 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 6 Feb 2007, Roland McGrath wrote:

> > So for the sake of argument, let's assume that debug registers can be 
> > assigned with priority values ranging from 0 to 7 (overkill, but who 
> > cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
> > can request whatever priority they like.  The well-behaved cases you've 
> > been discussing will use priority 0, and the invasive cases can use 
> > priority 7.  (With appropriate symbolic names instead of raw numeric 
> > values, naturally.)
> 
> Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
> average bear use 0 or something especially unobtrusive use >0, and
> something very obtrusive use -many.

I wonder where this convention of using lower numbers to indicate higher 
priorities comes from...  It seems to be quite prevalent even though it's 
obviously backwards.

>  Unless you are really going to pack it
> into a few bits somewhere, I'd make it an arbitrary int rather than a
> special small range; it's just for sort order comparison.  Bottom line, I
> don't really care about the numerology.  Just so "break ptrace", "don't
> break ptrace", and "readily get out of the way on demand" can be expressed.
> We can always fine-tune it later as there are more concrete users.

Okay; I'm not fixated on any particular size.

> > Or maybe that's too complicated.  Perhaps all userspace assignments should 
> > always use the same priority level.  
> 
> No, I want priorities among user-mode watchpoint users too.  ptrace is
> rigid, but newer facilities can coexist with ptrace on the same thread and
> with kwatch, and do fancy new things to fall back when there is debugreg
> allocation pressure.  Future user facilities might be able to do VM tricks
> that are harder to make workable for kernel mode, for example.  

All right.  However this means thread_struct will have to grow in order to
hold each task's debug-register allocations and priorities.  Would that be
acceptable?  (This might be a good reason to keep the number of bits
down.)

Another question: How can a program using the ptrace API ever give up a
debug-register allocation?  Disabling the register isn't sufficient; a
user program should be able to store a watchpoint address in dr1, enable
it in dr7, disable it in dr7, and then re-enable it in the expectation
that the address stored in dr1 hasn't been overwritten.  As far as I can
see, ptrace-type allocations have to be permanent (until the task exits or
execs, or uses some other to-be-determined API to do the de-allocation.)

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-07 19:22 ` [PATCH] Kwatch: kernel watchpoints using CPU debug registers Alan Stern
@ 2007-02-07 22:08   ` Bob Copeland
  2007-02-09 10:21   ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: Bob Copeland @ 2007-02-07 22:08 UTC (permalink / raw)
  To: Alan Stern
  Cc: Roland McGrath, Prasanna S Panchamukhi, Kernel development list

On 2/7/07, Alan Stern <stern@rowland.harvard.edu> wrote:
> I wonder where this convention of using lower numbers to indicate higher
> priorities comes from...  It seems to be quite prevalent even though it's
> obviously backwards.

I agree but at least in the case of 'nice' it works in the sense that
the value increases with increasing niceness.  Done the other way,
they would have to call it 'mean,' then someone would wonder why 'mean
10 20' prints 'No such file or directory' instead of '15'...

Bob

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-07 19:22 ` [PATCH] Kwatch: kernel watchpoints using CPU debug registers Alan Stern
  2007-02-07 22:08   ` Bob Copeland
@ 2007-02-09 10:21   ` Roland McGrath
  2007-02-09 15:54     ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-02-09 10:21 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> All right.  However this means thread_struct will have to grow in order to
> hold each task's debug-register allocations and priorities.  Would that be
> acceptable?  (This might be a good reason to keep the number of bits
> down.)

Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 

I'm inclined to make thread_struct smaller than it is now by making it
indirect (debugreg[8] -> one pointer).  It feels like this would be pretty
safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
at the field when TIF_DEBUG isn't set.

> Another question: How can a program using the ptrace API ever give up a
> debug-register allocation?  Disabling the register isn't sufficient; a
> user program should be able to store a watchpoint address in dr1, enable
> it in dr7, disable it in dr7, and then re-enable it in the expectation
> that the address stored in dr1 hasn't been overwritten.  As far as I can
> see, ptrace-type allocations have to be permanent (until the task exits or
> execs, or uses some other to-be-determined API to do the de-allocation.)

I hadn't really thought about this before, but it's pretty straightforward.
Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
%dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
%dr7 & (0x000f0003 << N) for %drN, I guess it is.
((((1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) |
 ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say.

Each allocation owns those 38 bits (70 bits on x86_64).  When all those
bits are zero, the allocation is not active and might as well not be there
(except for whatever semantics you might want to have about an allocation's
lifetime as distinct from the event of resetting its contents).  

gdb already works this way: when it removes a watchpoint, it clears drN to
zero when it zeros all the corresponding bits in dr7.  (In fact it's in a
separate call after changing dr7, because of the ptrace interface.)

Your question made me think about the %dr6 issue, too, which I also hadn't
thought about before.  It looks like your patch handles this correctly, but
it's a subtle point that I think warrants some comments in the code.  When
userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
watchpoints that come along after the user signal is generated can clobber
the CPU %dr6 with new hits, but userland should not perceive this.  This
works out because what userland sees is thread.debugreg[6], the only place
that sets it is do_debug, and a kwatch hit bails out before changing it.
Any other new users of the debugreg sharing interface need to be cognizant
of the %dr6 issue to avoid stepping on old ptrace uses.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-09 10:21   ` Roland McGrath
@ 2007-02-09 15:54     ` Alan Stern
  2007-02-09 23:31       ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-02-09 15:54 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Fri, 9 Feb 2007, Roland McGrath wrote:

> > All right.  However this means thread_struct will have to grow in order to
> > hold each task's debug-register allocations and priorities.  Would that be
> > acceptable?  (This might be a good reason to keep the number of bits
> > down.)
> 
> Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 
> 
> I'm inclined to make thread_struct smaller than it is now by making it
> indirect (debugreg[8] -> one pointer).  It feels like this would be pretty
> safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
> at the field when TIF_DEBUG isn't set.

I had the same thought.

> > Another question: How can a program using the ptrace API ever give up a
> > debug-register allocation?  Disabling the register isn't sufficient; a
> > user program should be able to store a watchpoint address in dr1, enable
> > it in dr7, disable it in dr7, and then re-enable it in the expectation
> > that the address stored in dr1 hasn't been overwritten.  As far as I can
> > see, ptrace-type allocations have to be permanent (until the task exits or
> > execs, or uses some other to-be-determined API to do the de-allocation.)
> 
> I hadn't really thought about this before, but it's pretty straightforward.
> Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
> %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
> %dr7 & (0x000f0003 << N) for %drN, I guess it is.
> ((((1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) |
>  ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say.
> 
> Each allocation owns those 38 bits (70 bits on x86_64).  When all those
> bits are zero, the allocation is not active and might as well not be there
> (except for whatever semantics you might want to have about an allocation's
> lifetime as distinct from the event of resetting its contents).  

Okay, that makes sense.

> Your question made me think about the %dr6 issue, too, which I also hadn't
> thought about before.  It looks like your patch handles this correctly, but
> it's a subtle point that I think warrants some comments in the code.  When
> userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
> and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
> watchpoints that come along after the user signal is generated can clobber
> the CPU %dr6 with new hits, but userland should not perceive this.  This
> works out because what userland sees is thread.debugreg[6], the only place
> that sets it is do_debug, and a kwatch hit bails out before changing it.
> Any other new users of the debugreg sharing interface need to be cognizant
> of the %dr6 issue to avoid stepping on old ptrace uses.

Yes.  In fact, the current existing code does not handle dr6 correctly.  
It never clears the register, which means you're likely to get into 
trouble when multiple breakpoints (or watchpoints) are enabled.


Here's another complication -- and this is one I can't figure out any easy
solutions for.  The type of API we've been discussing will work well
enough on UP systems, but what about SMP?  I don't see any value in having
a kernel watchpoint enabled on some CPUs and not others.  But then what
should happen when a debug register is in use by kwatch and a ptrace
request on one CPU usurps it (leaving no other register in which to put
it)?  Not to mention the difficulties of keeping track of everything when
the same kwatch watchpoint is stored in different debug registers on
different CPUs.

It's really quite a tricky matter.  Should a register be allocated to
kwatch only when no user process needs it?  Should we really go about
checking the requirements of every single process whenever a kwatch
allocation request comes in?  What if the processes which need a
particular register aren't running -- should the register then be given to
kwatch?  What if one of those processes then does start running on one
CPU?

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-09 15:54     ` Alan Stern
@ 2007-02-09 23:31       ` Roland McGrath
  2007-02-10  4:32         ` Alan Stern
  2007-02-21 20:35         ` Alan Stern
  0 siblings, 2 replies; 70+ messages in thread
From: Roland McGrath @ 2007-02-09 23:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> Yes.  In fact, the current existing code does not handle dr6 correctly.  
> It never clears the register, which means you're likely to get into 
> trouble when multiple breakpoints (or watchpoints) are enabled.

This is a subtle change from the existing ABI, in which userland has to
clear %dr6 via ptrace itself.  But gdb never does that AFAICT.  So it's in
fact subject to confusion when two watchpoints are set and the second hits
after the first.  So gdb ought to be fixed to clear dr6 via ptrace, to work
with existing and older kernels.

I don't think I really object to the ABI change of clearing %dr6 after an
exception so that it does not accumulate multiple results.  But first I'll
have to convince myself that we never actually do want to accumulate
multiple results.  Hmm, I think we can, so maybe I do object.  If you set
two watchpoints inside a user buffer and then do a system call that touches
both those addresses (e.g. read), then you will go through do_debug (to
send_sigtrap) twice before returning to user mode.  When the syscall is
done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
handle this case, but it should.)  Am I wrong?

So this gets to the more complicated view of %dr6 handling that I had first
had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
%dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
(i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
So when kwatch swallows a debug exception, it should mask off its bit from
%dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
system call (including interrupt handlers), and when it gets to the
userland debugger examining dr6 it sees the low 2 bits both set.

> It's really quite a tricky matter.  Should a register be allocated to
> kwatch only when no user process needs it?  Should we really go about
> checking the requirements of every single process whenever a kwatch
> allocation request comes in?  What if the processes which need a
> particular register aren't running -- should the register then be given to
> kwatch?  What if one of those processes then does start running on one
> CPU?

To "go about checking the requirements of every single process" is not so
hard as it sounds when they're recorded as a single global use count per
slot, as your original code does.  When you mentioned a "your allocation is
available" callback, I was thinking it might come to that being called
inside context switch.  It's all rather tricky, indeed.  

The obvious answer is to start simple.  If any user process anywhere uses
drN, kwatch has to give it up for all CPUs (watchpoints with less than
"break ptrace" priority do).  If anyone really cares about more flexibility
than that, we can change or extend it.  Some copious comments in the
interface descriptions can lead them in the right direction if the
situation comes up.  Probably with systemtap support in a while, we'll get
a lot more concrete uses of watchpoints and people finding out what really 
matters to them.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-09 23:31       ` Roland McGrath
@ 2007-02-10  4:32         ` Alan Stern
  2007-02-18  3:03           ` Roland McGrath
  2007-02-21 20:35         ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-02-10  4:32 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Fri, 9 Feb 2007, Roland McGrath wrote:

> I don't think I really object to the ABI change of clearing %dr6 after an
> exception so that it does not accumulate multiple results.  But first I'll
> have to convince myself that we never actually do want to accumulate
> multiple results.  Hmm, I think we can, so maybe I do object.  If you set
> two watchpoints inside a user buffer and then do a system call that touches
> both those addresses (e.g. read), then you will go through do_debug (to
> send_sigtrap) twice before returning to user mode.  When the syscall is
> done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
> at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
> handle this case, but it should.)  Am I wrong?

I think you're right.

> So this gets to the more complicated view of %dr6 handling that I had first
> had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
> %dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
> (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
> So when kwatch swallows a debug exception, it should mask off its bit from
> %dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
> sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
> system call (including interrupt handlers), and when it gets to the
> userland debugger examining dr6 it sees the low 2 bits both set.

Okay; I'll fix this too.  Come to think of it, kwatch needs to handle 
multiple hits as well -- there might be two watchpoints set to the same 
address.

> > It's really quite a tricky matter.  Should a register be allocated to
> > kwatch only when no user process needs it?  Should we really go about
> > checking the requirements of every single process whenever a kwatch
> > allocation request comes in?  What if the processes which need a
> > particular register aren't running -- should the register then be given to
> > kwatch?  What if one of those processes then does start running on one
> > CPU?
> 
> To "go about checking the requirements of every single process" is not so
> hard as it sounds when they're recorded as a single global use count per
> slot, as your original code does.  When you mentioned a "your allocation is
> available" callback, I was thinking it might come to that being called
> inside context switch.  It's all rather tricky, indeed.  
> 
> The obvious answer is to start simple.  If any user process anywhere uses
> drN, kwatch has to give it up for all CPUs (watchpoints with less than
> "break ptrace" priority do).  If anyone really cares about more flexibility
> than that, we can change or extend it.  Some copious comments in the
> interface descriptions can lead them in the right direction if the
> situation comes up.  Probably with systemtap support in a while, we'll get
> a lot more concrete uses of watchpoints and people finding out what really 
> matters to them.

It's still more complicated than you might think.  Let's say two user
processes each have dr1 allocated, one with low priority and the other
with high priority.  The kernel has to be aware of the high-priority
allocation, so that it can refuse intermediate-priority kwatch allocation
attempts.  Now suppose the second process exits.  dr1 is still allocated
to the first user process but only with low priority, so now
intermediate-priority kwatch allocation attempts should succeed.

In order for this to work, when the second process gives up its allocation 
I would have to either scan though all tasks to see the first process, or 
else keep several global use counts for each slot -- in fact, one use 
count for each priority level.  That's doable if there are only a few 
levels, but not if there are many.

How do you suggest this be handled?  Maybe we should just keep track of a
maximum user priority level for each slot, allowing it to go up but not
down until all user processes have given up the slot.  (I.e., in the
example above the later kwatch requests would still fail because we would
continue to remember the high user priority level so long as the first
process maintained its allocation.)  That would be overly pessimistic, but
it would at least be safe.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-10  4:32         ` Alan Stern
@ 2007-02-18  3:03           ` Roland McGrath
  0 siblings, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-02-18  3:03 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> How do you suggest this be handled?  Maybe we should just keep track of a
> maximum user priority level for each slot, allowing it to go up but not
> down until all user processes have given up the slot.  (I.e., in the
> example above the later kwatch requests would still fail because we would
> continue to remember the high user priority level so long as the first
> process maintained its allocation.)  That would be overly pessimistic, but
> it would at least be safe.

I think that is certainly fine to start with.  Like I said before, we can
start conservative and then worry about more complexity as the concrete
needs arise.  I don't think it will really be any trouble to change some of
these low-level interfaces later to accomodate more sophisticated callers
and implementations.  

When the issue does arise, scanning all the necessary tasks may not really
need to be so costly.  That is, if rather than scanning all tasks in the
system, it's a list of debugreg allocations.  The callers doing fancy
allocation can be responsible for passing in storage for a struct
containing the list structure.  That would naturally embed in struct
kwatch.  Having the debugreg allocation routines pass in such a structure
would also be useful for another kind of flexibility I'd like to have
eventually.  That is, "local" allocations that are local to a group of
tasks rather than just one.  For example, a debugger most often actually
wants to share watchpoints among all the threads sharing an address space.
If identical settings are in fact shared, the storage requirements for
using watchpoints in many-threaded processes scale that much better.

I think we have a while before we have to actually figure any of that out
in detail.  The simple starting point covers all our immediate concrete
concerns.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-09 23:31       ` Roland McGrath
  2007-02-10  4:32         ` Alan Stern
@ 2007-02-21 20:35         ` Alan Stern
  2007-02-22 11:43           ` S. P. Prasanna
  2007-02-23  2:19           ` Roland McGrath
  1 sibling, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-02-21 20:35 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Going back to something you mentioned earlier...

On Fri, 9 Feb 2007, Roland McGrath wrote:

> I don't think I really object to the ABI change of clearing %dr6 after an
> exception so that it does not accumulate multiple results.  But first I'll
> have to convince myself that we never actually do want to accumulate
> multiple results.  Hmm, I think we can, so maybe I do object.  If you set
> two watchpoints inside a user buffer and then do a system call that touches
> both those addresses (e.g. read), then you will go through do_debug (to
> send_sigtrap) twice before returning to user mode.  When the syscall is
> done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
> at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
> handle this case, but it should.)  Am I wrong?

Yes, you are wrong -- although perhaps you shouldn't be.

The current version of do_debug() clears dr7 when a debug interrupt is 
fielded.  As a result, if a system call touches two watchpoint addresses 
in userspace only the first access will be noticed.

This is probably a bug in do_debug().  It would be better to disable each
individual userspace watchpoint as it is triggered (or even not to disable
it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
what if the user is blocking or ignoring SIGTRAP?)

Moving on...

I've worked out a plan for implementing combined user/kernel mode 
breakpoints and watchpoints (call them "debugpoints" as a catch-all 
term).  It should work transparently with ptrace and should accomodate 
whatever scheme utrace decides to adopt.  There won't need to be a 
separate kwatch facility on top of it; the new combined facility will 
handle debugpoints in both userspace and kernelspace.

The idea is that callers can register and unregister a struct debugpoint, 
which contains fields for the type, length, address, and priority as well 
as three callback pointers (for installed, uninstalled, and triggered).  
The debug core will keep these structures sorted by priority and will 
allocate the available debug registers based on the priorities of the 
userspace and kernelspace requests.  Each CPU will have its own array of 
pointers to these structures, indicating which debugpoints are currently 
enabled.

To work with ptrace, the new scheme will completely virtualize the debug
registers.  So for example, a ptrace call might request a debugpoint in
dr0, but it could end up that the physical register used is really dr2
instead.  The various bits in dr6 and dr7 will be mapped in such a way
that the entire procedure is transparent to the user.  Debugpoints 
registered in kernelspace or by utrace won't care, of course.

There are two things I am uncertain about: vm86 mode and kprobes.  I don't
know anything about how either of them works.  Judging from the current
code, nothing much should be needed -- debug traps in vm86 mode are
handled by calling handle_vm86_trap(), and kprobes puts itself at the
start of the notify_die() chain so it can handle single-step traps.  
Eventually it will be necessary to check with someone who really 
understands the issues.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-21 20:35         ` Alan Stern
@ 2007-02-22 11:43           ` S. P. Prasanna
  2007-02-23  2:19           ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: S. P. Prasanna @ 2007-02-22 11:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: Roland McGrath, Kernel development list

On Wed, Feb 21, 2007 at 03:35:13PM -0500, Alan Stern wrote:
> Going back to something you mentioned earlier...
> 
[...]
> On Fri, 9 Feb 2007, Roland McGrath wrote:
> There are two things I am uncertain about: vm86 mode and kprobes.  I don't
> know anything about how either of them works.  Judging from the current
> code, nothing much should be needed -- debug traps in vm86 mode are
> handled by calling handle_vm86_trap(), and kprobes puts itself at the
> start of the notify_die() chain so it can handle single-step traps.  
> Eventually it will be necessary to check with someone who really 
> understands the issues.

Yes, Kprobes needs to get notified first to handle single-step traps. So kwatch
getting notified secound should be fine.

Thanks
Prasanna
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-41776329

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-21 20:35         ` Alan Stern
  2007-02-22 11:43           ` S. P. Prasanna
@ 2007-02-23  2:19           ` Roland McGrath
  2007-02-23 16:55             ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-02-23  2:19 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> Yes, you are wrong -- although perhaps you shouldn't be.
> 
> The current version of do_debug() clears dr7 when a debug interrupt is 
> fielded.  As a result, if a system call touches two watchpoint addresses 
> in userspace only the first access will be noticed.

Ah, I see.  I think it would indeed be nice to fix this.

> This is probably a bug in do_debug().  It would be better to disable each
> individual userspace watchpoint as it is triggered (or even not to disable
> it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
> what if the user is blocking or ignoring SIGTRAP?)

The user blocking or ignoring it doesn't come up, because it's a
force_sig_info call.  However, a debugger will indeed swallow the signal
through ptrace/utrace means.  In ptrace, the dr7 is always going to get
reset because there will always be a context switch out and back in that
does it.  But with utrace it's now possible to swallow the signal and keep
going without a context switch (e.g. a breakpoint that is just doing
logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
(or maybe it's TIF_HWBKPT).

You should not actually need to disable user watchpoints, because in data
watchpoints the exception comes after the instruction completes.  Only for
instruction watchpoints does the exception come before the instruction
executes, and no user watchpoints can be in the address range containing
kernel code.  

SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t.
All user watchpoints will be handled via the signal; this is the only way
ptrace can report them, and is also the utrace way of doing things.
do_debug can happen inside kernel code, and tracing of user-level tasks can
only safely do anything at the point just before returning to user mode,
where signals are handled.  So, getting to send_sigtrap in do_debug is
enough to say "one or more user debug exceptions happened".  The %dr6 value
that collects in the thread state to be seen by ptrace, or by utrace-based
things using your new facility, needs to collect all the %dr6 bits that
were set by the hardware and weren't consumed by kernel-level tracing.  An
eventual utrace-based thing might in fact have some other way to tie in so
that the event details could just be in some call made by do_debug and not
recorded in the thread's virtual %dr6 value.  But at least for ptrace, they
should collect there if it becomes possible for more than one exception to
happen while in kernel mode or in a single user instruction.  (A single
instruction can cause multiple exceptions at the hardware level.)

> I've worked out a plan for implementing combined user/kernel mode
> breakpoints and watchpoints (call them "debugpoints" as a catch-all
> term).  It should work transparently with ptrace and should accomodate
> whatever scheme utrace decides to adopt.  There won't need to be a
> separate kwatch facility on top of it; the new combined facility will
> handle debugpoints in both userspace and kernelspace.

That sounds great.  I'm not thrilled with the name "debugpoint", I have to
tell you.  The hardware documentation calls all these things "breakpoints",
and I think "data breakpoint" and "instruction breakpoint" are pretty good
terms.  How about "hwbkpt" for the facility API?

> To work with ptrace, the new scheme will completely virtualize the debug
> registers.  So for example, a ptrace call might request a debugpoint in
> dr0, but it could end up that the physical register used is really dr2
> instead.  The various bits in dr6 and dr7 will be mapped in such a way
> that the entire procedure is transparent to the user.  Debugpoints 
> registered in kernelspace or by utrace won't care, of course.

I think that's a fine idea.  

The one caveat I have here is that I don't want ptrace (via utrace) to have
to supply the usual structure.  I probably only think this because it would
be a pain for the ptrace/utrace implementation to find a place to stick it.
But I have a rationalization.  The old ptrace interface, and the
utrace_regset for debugregs, is not really a "debugpoint user" in the sense
you're defining it.  It's an access to the "raw" debugregs as part of the
thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
detach ptrace, and the thread will get a SIGTRAP later though there is no
remnant at that point of the debugger interface that made it come about.
For the degenerate case of medium-high priority with no handler callbacks
(that should instead be an error at registration time if no slot is free),
you shouldn't really need any per-caller storage (there can only be one
such caller per slot).  

> There are two things I am uncertain about: vm86 mode and kprobes.  I don't
> know anything about how either of them works.  

I know about kprobes.  I don't know about vm86, but I can read the code.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-23  2:19           ` Roland McGrath
@ 2007-02-23 16:55             ` Alan Stern
  2007-02-24  0:08               ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-02-23 16:55 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Thu, 22 Feb 2007, Roland McGrath wrote:

> > Yes, you are wrong -- although perhaps you shouldn't be.
> > 
> > The current version of do_debug() clears dr7 when a debug interrupt is 
> > fielded.  As a result, if a system call touches two watchpoint addresses 
> > in userspace only the first access will be noticed.
> 
> Ah, I see.  I think it would indeed be nice to fix this.
> 
> > This is probably a bug in do_debug().  It would be better to disable each
> > individual userspace watchpoint as it is triggered (or even not to disable
> > it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
> > what if the user is blocking or ignoring SIGTRAP?)
> 
> The user blocking or ignoring it doesn't come up, because it's a
> force_sig_info call.  However, a debugger will indeed swallow the signal
> through ptrace/utrace means.  In ptrace, the dr7 is always going to get
> reset because there will always be a context switch out and back in that
> does it.  But with utrace it's now possible to swallow the signal and keep
> going without a context switch (e.g. a breakpoint that is just doing
> logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
> that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
> (or maybe it's TIF_HWBKPT).

I think the best approach will be not to reset dr7 at all.  Then there 
won't be any need to worry about restoring it.  Leaving a userspace 
watchpoint enabled while running in the kernel ought not to matter; a 
system call shouldn't touch any address in userspace more than once or 
twice.

> > I've worked out a plan for implementing combined user/kernel mode
> > breakpoints and watchpoints (call them "debugpoints" as a catch-all
> > term).  It should work transparently with ptrace and should accomodate
> > whatever scheme utrace decides to adopt.  There won't need to be a
> > separate kwatch facility on top of it; the new combined facility will
> > handle debugpoints in both userspace and kernelspace.
> 
> That sounds great.  I'm not thrilled with the name "debugpoint", I have to
> tell you.  The hardware documentation calls all these things "breakpoints",
> and I think "data breakpoint" and "instruction breakpoint" are pretty good
> terms.  How about "hwbkpt" for the facility API?

Okay, I'll change the name.

> The one caveat I have here is that I don't want ptrace (via utrace) to have
> to supply the usual structure.  I probably only think this because it would
> be a pain for the ptrace/utrace implementation to find a place to stick it.
> But I have a rationalization.  The old ptrace interface, and the
> utrace_regset for debugregs, is not really a "debugpoint user" in the sense
> you're defining it.  It's an access to the "raw" debugregs as part of the
> thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
> detach ptrace, and the thread will get a SIGTRAP later though there is no
> remnant at that point of the debugger interface that made it come about.
> For the degenerate case of medium-high priority with no handler callbacks
> (that should instead be an error at registration time if no slot is free),
> you shouldn't really need any per-caller storage (there can only be one
> such caller per slot).  

My idea was to put 4 hwbkpt structures in thread_struct, so they would
always be available for use by ptrace.  However it turned out not to be
feasible to replace the debugreg array with something more sophisticated,
because of conflicting declarations and problems with the ordering of
#includes.  So instead I have been forced to replace debugreg[] with a
pointer to a structure which can be allocated as needed.

This raises the possibility that a PTRACE syscall might fail because the 
allocation fails.  Hopefully that won't be an issue?

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-23 16:55             ` Alan Stern
@ 2007-02-24  0:08               ` Roland McGrath
  2007-03-02 17:19                 ` [RFC] hwbkpt: Hardware breakpoints (was Kwatch) Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-02-24  0:08 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> I think the best approach will be not to reset dr7 at all.  Then there 
> won't be any need to worry about restoring it.  Leaving a userspace 
> watchpoint enabled while running in the kernel ought not to matter; a 
> system call shouldn't touch any address in userspace more than once or 
> twice.

Hmm.  That sounds reasonable.  But I wonder why the old code clears %dr7.
It's been that way for a long time (since 2.4 at least).

> My idea was to put 4 hwbkpt structures in thread_struct, so they would
> always be available for use by ptrace.  However it turned out not to be
> feasible to replace the debugreg array with something more sophisticated,
> because of conflicting declarations and problems with the ordering of
> #includes.  So instead I have been forced to replace debugreg[] with a
> pointer to a structure which can be allocated as needed.

I think that's preferable anyway.  Most tasks most of the time will never
need that storage, so why not make thread_struct a little smaller?
(There is also the potential for sharing, which I mentioned earlier.)

> This raises the possibility that a PTRACE syscall might fail because the 
> allocation fails.  Hopefully that won't be an issue?

It's not a new issue, anyway, after utrace.  The utrace-based ptrace can
fail for PTRACE_ATTACH because of OOM too, which wasn't possible before.
I think it's survivable.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-02-24  0:08               ` Roland McGrath
@ 2007-03-02 17:19                 ` Alan Stern
  2007-03-05  7:01                   ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-02 17:19 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Roland and Prasanna:

Here's my first attempt, lightly tested, at an hwbkpt implementation.  It
includes copious comments, so it shouldn't be too hard to figure out (if
you read the files in the right order).  The patch below is meant for
2.6.21-rc2; porting it to -mm shouldn't be very hard.

There are still several loose ends and unanswered questions.

	I pretty much copied the existing code for handling vm86 mode
	and single-step exceptions, without fully understanding it.

	The code doesn't virtualize the BS (single-step) flag in DR6
	for userspace.  It could be added, but I wonder whether it is
	really needed.

	Unlike the existing code, DR7 is re-enabled upon returning from
	a debug interrupt.  That means it doesn't have to be enabled
	when delivering a SIGTRAP.

	Setting user breakpoints on I/O ports should require permissions
	checking.  I haven't tried to figure out how that works or
	how to implement it yet.

	It seems likely that some of the new routines should be marked
	"__kprobes", but I don't know which, or even what that annotation
	is supposed to mean.

	When CPUs go on- or off-line, their debug registers need to be
	initialized or cleared.  I did a little bit of that, but more is
	needed.  In particular, CPU hotplugging and kexec have to take
	this into account.

	The parts relating to kernel breakpoints could be made conditional
	on a Kconfig option.  The amount of code space saved would be
	relatively small; I'm not sure that it would be worthwhile.

Probably there are some more issues I haven't thought of.  Anyway, let me 
know what you think.

Alan Stern



Index: 2.6.21-rc2/include/asm-i386/hwbkpt.h
===================================================================
--- /dev/null
+++ 2.6.21-rc2/include/asm-i386/hwbkpt.h
@@ -0,0 +1,185 @@
+#ifndef	_I386_HWBKPT_H
+#define	_I386_HWBKPT_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hwbkpt - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @data: private data for use by the breakpoint owner
+ * @address: location (virtual address) of the breakpoint
+ * @len: extent of the breakpoint address (1, 2, or 4 bytes)
+ * @type: breakpoint type (write-only, read/write, execute, or I/O)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hwbkpt structures are the kernel's way of representing hardware
+ * breakpoints.  These can be either execution breakpoints (triggered
+ * on instruction execution) or data breakpoints (also known as
+ * "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address, @len, and @type fields are standard, indicating the
+ * location of the breakpoint, its extent in bytes, and the type of
+ * access that will trigger the breakpoint.  Possible values for @len
+ * are 1, 2, and 4.  Possible values for @type are %HWBKPT_WRITE
+ * (triggered on write access), %HWBKPT_RW (triggered on read or
+ * write access), %HWBKPT_IO (triggered on I/O-space access), and
+ * %HWBKPT_EXECUTE (triggered on instruction execution).  Certain
+ * restrictions apply: %HWBKPT_EXECUTE requires that @len be 1, and
+ * %HWBKPT_IO is available only on processors with Debugging Extensions.
+ *
+ * In register_user_hwbkpt() and modify_user_hwbkpt(), @address must
+ * refer to a location in user space (unless @type is %HWBKPT_IO).
+ * The breakpoint will be active only while the requested task is
+ * running.  Conversely, in register_kernel_hwbkpt() @address must
+ * refer to a location in kernel space, and the breakpoint will be
+ * active on all CPUs regardless of the task being run.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hwbkpt structure and the
+ * processor registers.  %HWBKPT_EXECUTE traps occur before the
+ * breakpointed instruction executes; all other types of trap occur
+ * after the memory or I/O access has taken place.  All breakpoints
+ * are disabled while @triggered runs, to avoid recursive traps and
+ * allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed (provided the member entries are
+ * valid), but the breakpoint may not be installed in a debug register
+ * right away.  Physical debug registers are allocated based on the
+ * priority level stored in @priority (higher values indicate higher
+ * priority).  User-space breakpoints within a single thread compete
+ * with one another and all user-space breakpoints compete with all
+ * kernel-space breakpoints, however user-space breakpoints in different
+ * threads do not compete.  %HWBKPT_PRIO_PTRACE is the level used for
+ * ptrace requests; an unobtrusive kernel-space breakpoint will use
+ * %HWBKPT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HWBKPT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when
+ * these events occur.  It is legal for @installed or @uninstalled to
+ * be %NULL, however @triggered must not be.  Note that it is not
+ * possible to register or unregister a breakpoint from within a callback
+ * routine, since doing so requires a process context.  Note also that
+ * for user breakpoints, @installed and @uninstalled may be called during
+ * the middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  This way the breakpoint owner
+ * knows that during the time interval from @installed to @uninstalled,
+ * all events are faithfully reported.  (It is not possible to do any
+ * better than this in general, because on SMP systems there is no way to
+ * set a debug register simultaneously on all CPUs.)  The same isn't
+ * always true with user-space breakpoints, but the differences should
+ * not be visible to a user process.
+ *
+ * The @address, @len, and @type fields in a use-space breakpoint can be
+ * changed by calling modify_user_hwbkpt().  Kernel-space breakpoints
+ * cannot be modified, nor can the @priority value in user-space
+ * breakpoints, after the breakpoint has been registered.  And of course
+ * all the fields in a %hwbkpt structure other than @data should be
+ * treated as read-only while the breakpoint is registered.
+ *
+ * @node and @status are intended for internal use; however @status may
+ * be read to determine whether or not the breakpoint is currently
+ * installed.
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hwbkpt.h>
+ *
+ * static void triggered(struct hwbkpt *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hwbkpt my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address = &pid_max;
+ * 	my_bp.type = HWBKPT_WRITE;
+ * 	my_bp.len = 4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HWBKPT_PRIO_NORMAL;
+ * 	rc = register_kernel_hwbkpt(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hwbkpt(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hwbkpt {
+	struct list_head	node;
+	void		(*triggered)(struct hwbkpt *, struct pt_regs *);
+	void		(*installed)(struct hwbkpt *);
+	void		(*uninstalled)(struct hwbkpt *);
+	void		*data;
+	void		*address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/* HW breakpoint types */
+#define HWBKPT_EXECUTE		0x0	/* trigger on instruction execute */
+#define HWBKPT_WRITE		0x1	/* trigger on memory write */
+#define HWBKPT_IO		0x2	/* trigger on I/O space access */
+#define HWBKPT_RW		0x3	/* trigger on memory read or write */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HWBKPT_PRIO_NORMAL	25
+#define HWBKPT_PRIO_PTRACE	50
+#define HWBKPT_PRIO_HIGH	75
+
+/* HW breakpoint status values */
+#define HWBKPT_REGISTERED	1
+#define HWBKPT_INSTALLED	2
+
+/*
+ * The tsk argument in the following three routines will usually be a
+ * process being PTRACEd by the current task, normally a debugger.
+ * It is also legal for tsk to be the current task.  In either case we
+ * can guarantee that tsk will not start running on another CPU while
+ * its breakpoints are being modified.  If that happened it could cause
+ * a crash.
+ */
+int register_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
+void unregister_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
+int modify_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp,
+		void *address, u8 len, u8 type);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hwbkpt(struct hwbkpt *bp);
+void unregister_kernel_hwbkpt(struct hwbkpt *bp);
+
+#endif	/* _I386_HWBKPT_H */
Index: 2.6.21-rc2/arch/i386/kernel/process.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/process.c
+++ 2.6.21-rc2/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/pda.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,15 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	flush_thread_hwbkpt(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	flush_thread_hwbkpt(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -430,14 +432,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hwbkpt_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hwbkpt_info)) {
+		if (copy_thread_hwbkpt(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -467,7 +476,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hwbkpt(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -479,18 +489,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hwbkpt(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -540,16 +550,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -682,7 +682,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -714,6 +714,13 @@ struct task_struct fastcall * __switch_t
 
 	write_pda(pcurrent, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hwbkpt(next_p);
+
 	return prev_p;
 }
 
Index: 2.6.21-rc2/arch/i386/kernel/signal.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/signal.c
+++ 2.6.21-rc2/arch/i386/kernel/signal.c
@@ -592,13 +592,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: 2.6.21-rc2/arch/i386/kernel/traps.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/traps.c
+++ 2.6.21-rc2/arch/i386/kernel/traps.c
@@ -810,58 +810,21 @@ fastcall void __kprobes do_debug(struct 
 	struct task_struct *tsk = current;
 
 	get_debugreg(condition, 6);
+	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
 
 	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
 					SIGTRAP) == NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
-	}
-
 	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
-	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
-	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
-	}
-
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
-
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+	else
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: 2.6.21-rc2/include/asm-i386/debugreg.h
===================================================================
--- 2.6.21-rc2.orig/include/asm-i386/debugreg.h
+++ 2.6.21-rc2/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -58,7 +60,37 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
-#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
+#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
+#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */
+
+
+/*
+ * HW breakpoint additions
+ */
+
+#include <asm/hwbkpt.h>
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+struct thread_hwbkpt {		/* HW breakpoint info for a thread */
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+
+	/* ptrace support */
+	struct hwbkpt		ptrace_bps[HB_NUM];
+	unsigned long		vdr6;		/* Virtualized values */
+	unsigned long		vdr7;		/*  for DR6 and DR7   */
+};
+
+struct thread_hwbkpt *alloc_thread_hwbkpt(struct task_struct *tsk);
+void flush_thread_hwbkpt(struct task_struct *tsk);
+int copy_thread_hwbkpt(struct task_struct *tsk, struct task_struct *child,
+		unsigned long clone_flags);
+void dump_thread_hwbkpt(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hwbkpt(struct task_struct *tsk);
+void load_debug_registers(void);
 
 #endif
Index: 2.6.21-rc2/include/asm-i386/processor.h
===================================================================
--- 2.6.21-rc2.orig/include/asm-i386/processor.h
+++ 2.6.21-rc2/include/asm-i386/processor.h
@@ -402,8 +402,8 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	struct thread_hwbkpt	*hwbkpt_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: 2.6.21-rc2/arch/i386/kernel/hw-breakpoint.c
===================================================================
--- /dev/null
+++ 2.6.21-rc2/arch/i386/kernel/hw-breakpoint.c
@@ -0,0 +1,947 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW-breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	Copy bps for fork()?  CLONE_PTRACE?
+
+	Permissions for I/O user bp?
+
+	Virtualize dr6 Single-Step flag?
+
+	Handling of single-step exceptions in kernel space?
+	Handling of RF flag bit for execution faults?
+
+	CPU hotplug, kexec, etc?
+
+	__kprobes annotations?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm-generic/percpu.h>
+
+#include <asm/debugreg.h>
+#include <asm/kdebug.h>
+#include <asm/processor.h>
+
+
+/* Per-CPU debug register info */
+struct cpu_hwbkpt {
+	struct hwbkpt		*bps[HB_NUM];	/* Loaded breakpoints */
+	int			num_kbps;	/* Number of kernel bps */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hwbkpt, cpu_info);
+
+/* Kernel-space breakpoint data */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static unsigned long		kdr7;		/* Kernel DR7 value */
+static int			num_kbps;	/* Number of kernel bps */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(thread_list);			/* thread_hwbkpt list */
+static DEFINE_MUTEX(hwbkpt_mutex);		/* Protects everything */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0203,	/* LEN0, R/W0, GE, G0, L0 */
+	0x00ff020f,	/* Same for 0,1 */
+	0x0fff023f,	/* Same for 0,1,2 */
+	0xffff02ff	/* Same for 0,1,2,3 */
+};
+
+
+/*
+ * Install a single breakpoint in its debug register.
+ */
+static void install_breakpoint(struct cpu_hwbkpt *chbi, int i,
+		struct hwbkpt *bp)
+{
+	unsigned long temp;
+
+	chbi->bps[i] = bp;
+	temp = (unsigned long) bp->address;
+	switch (i) {
+		case 0:	set_debugreg(temp, 0);	break;
+		case 1:	set_debugreg(temp, 1);	break;
+		case 2:	set_debugreg(temp, 2);	break;
+		case 3:	set_debugreg(temp, 3);	break;
+	}
+}
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hwbkpt(struct task_struct *tsk)
+{
+	unsigned long dr7;
+	struct cpu_hwbkpt *chbi;
+	int i = HB_NUM;
+	unsigned long flags;
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this same time, so we can't use the global
+	 * value stored in num_kbps.  Instead we'll use the per-cpu
+	 * value stored in cpu_info. */
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Keep the DR7 bits that refer to kernel breakpoints */
+	get_debugreg(dr7, 7);
+	dr7 &= kdr7_masks[chbi->num_kbps];
+
+	/* Kernel breakpoints are stored starting in DR0 and going up,
+	 * and there are num_kbps of them.  Thread breakpoints are stored
+	 * starting in DR3 and going down, as many as we have room for. */
+	if (tsk && test_tsk_thread_flag(tsk, TIF_DEBUG)) {
+		struct thread_hwbkpt *thbi = tsk->thread.hwbkpt_info;
+		struct hwbkpt *bp;
+
+		set_debugreg(dr7, 7);	/* Disable user bps while switching */
+
+		/* Store this thread's breakpoint addresses and update
+		 * the statuses. */
+		list_for_each_entry(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < chbi->num_kbps) {
+				if (bp->status == HWBKPT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HWBKPT_REGISTERED;
+				}
+			} else {
+				if (bp->status != HWBKPT_INSTALLED) {
+					bp->status = HWBKPT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+				install_breakpoint(chbi, i, bp);
+			}
+		}
+
+		/* Mask in the parts of DR7 that refer to the new thread */
+		dr7 |= (~kdr7_masks[chbi->num_kbps] & thbi->tdr7);
+	}
+
+	/* Clear any remaining stale bp pointers */
+	while (--i >= chbi->num_kbps)
+		chbi->bps[i] = NULL;
+	set_debugreg(dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void switch_kernel_hwbkpt(struct cpu_hwbkpt *chbi)
+{
+	struct hwbkpt *bp;
+	int i;
+	unsigned long dr7;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	i = 0;
+	list_for_each_entry(bp, &kernel_bps, node) {
+		if (i >= chbi->num_kbps)
+			break;
+		install_breakpoint(chbi, i, bp);
+		++i;
+	}
+
+	dr7 = kdr7 & kdr7_masks[chbi->num_kbps];
+	set_debugreg(dr7, 7);
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hwbkpt *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->num_kbps = num_kbps;
+
+	/* Install both the kernel and the user breakpoints */
+	switch_kernel_hwbkpt(chbi);
+	switch_to_thread_hwbkpt(chbi->bp_task);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hwbkpt_mutex.
+ */
+static void update_all_cpus(void)
+{
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	update_this_cpu(NULL);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.
+ */
+static void accum_thread_tprio(struct thread_hwbkpt *thbi)
+{
+	int i;
+	struct hwbkpt *bp;
+
+	i = 0;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		tprio[i] = max(tprio[i], bp->priority);
+		if (++i >= HB_NUM)
+			break;
+	}
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hwbkpt_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hwbkpt *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio. */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hwbkpt_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int i;
+	struct hwbkpt *bp;
+	int new_num_kbps;
+	int changed = 0;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps. */
+	new_num_kbps = i = 0;
+	bp = list_entry(kernel_bps.next, struct hwbkpt, node);
+	while (i + new_num_kbps < HB_NUM) {
+		if (&bp->node == &kernel_bps || tprio[i] >= bp->priority)
+			++i;		/* User bps win a slot */
+		else {
+			++new_num_kbps;	/* Kernel bp wins a slot */
+			if (bp->status != HWBKPT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hwbkpt, node);
+		}
+	}
+	if (new_num_kbps != num_kbps) {
+		changed = 1;
+		num_kbps = new_num_kbps;
+	}
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled. */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HWBKPT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HWBKPT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		i = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (i++ >= num_kbps)
+				break;
+			if (bp->status != HWBKPT_INSTALLED) {
+				bp->status = HWBKPT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw-breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ */
+struct thread_hwbkpt *alloc_thread_hwbkpt(struct task_struct *tsk)
+{
+	if (!tsk->thread.hwbkpt_info) {
+		struct thread_hwbkpt *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hwbkpt), GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+			tsk->thread.hwbkpt_info = thbi;
+		}
+	}
+	return tsk->thread.hwbkpt_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ */
+void flush_thread_hwbkpt(struct task_struct *tsk)
+{
+	struct thread_hwbkpt *thbi = tsk->thread.hwbkpt_info;
+	struct hwbkpt *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hwbkpt_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HWBKPT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* At this point, it's a BUG if there are any registered breakpoints
+	 * other than the ones dedicated to PTRACE. */
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hwbkpt_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary, and don't keep
+	 * a pointer to a task which may be about to exit. */
+	if (tsk == current)
+		switch_to_thread_hwbkpt(NULL);
+	mutex_unlock(&hwbkpt_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hwbkpt(struct task_struct *tsk, struct task_struct *child,
+		unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE? */
+
+	child->thread.hwbkpt_info = NULL;
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Copy out the debug register information for a core dump.
+ */
+void dump_thread_hwbkpt(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hwbkpt *thbi = tsk->thread.hwbkpt_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = (unsigned long)
+					thbi->ptrace_bps[i].address;
+		u_debugreg[6] = thbi->vdr6;
+		u_debugreg[7] = thbi->vdr7;
+	}
+}
+
+/*
+ * Validate the settings in a hwbkpt structure.
+ */
+static int validate_settings(struct hwbkpt *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+
+	switch (bp->len) {
+	case 1:  case 2:  case 4:
+		break;
+	default:
+		return rc;
+	}
+
+	switch (bp->type) {
+	case HWBKPT_WRITE:
+	case HWBKPT_RW:
+		break;
+	case HWBKPT_IO:
+		if (!cpu_has_de)
+			return rc;
+		break;
+	case HWBKPT_EXECUTE:
+		if (bp->len == 1)
+			break;
+		/* FALL THROUGH */
+	default:
+		return rc;
+	}
+
+	/* Check that the address is in the proper range.  Note that tsk
+	 * is NULL for kernel bps and non-NULL for user bps. */
+	if (bp->type == HWBKPT_IO) {
+
+		/* Check whether the task has permission to access the
+		 * I/O port at bp->address. */
+		if (tsk) {
+			/* WRITEME */
+			return -EPERM;
+		}
+	} else {
+
+		/* Check whether bp->address points to user space */
+		if ((tsk != NULL) != ((unsigned long) bp->address < TASK_SIZE))
+			return rc;
+	}
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type, int local)
+{
+	unsigned long temp;
+
+	temp = ((len - 1) << 2) | type;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	if (local)
+		temp |= (DR_LOCAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_LOCAL_EXACT;
+	else
+		temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_EXACT;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for the list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct list_head *bp_list, int is_user)
+{
+	struct hwbkpt *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later. */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		drnum = (is_user ? HB_NUM - 1 - i : i);
+		dr7 |= encode_dr7(drnum, bp->len, bp->type, is_user);
+		if (++i >= HB_NUM)
+			break;
+	}
+	return dr7;
+}
+
+/*
+ * Update the DR7 value for a user thread.
+ */
+static void update_user_dr7(struct thread_hwbkpt *thbi)
+{
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies tsk->thread.hwbkpt_info
+ *		is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hwbkpt *bp, struct thread_hwbkpt *thbi,
+		struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hwbkpt *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk) {
+		head = &thbi->thread_bps;
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(head)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	} else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	list_add_tail(&bp->node, &temp_bp->node);
+
+	bp->status = HWBKPT_REGISTERED;
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hwbkpt *bp, struct thread_hwbkpt *thbi,
+		struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the thread_hwbkpt
+	 * structure, so that the virtualized debug register values will
+	 * remain valid. */
+	list_del(&bp->node);
+	if (tsk) {
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HWBKPT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/**
+ * register_user_hwbkpt - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being PTRACEd by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  In addition to the normal
+ * restrictions, I/O breakpoints are allowed only if @tsk may access the
+ * I/O port at @bp->address.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp)
+{
+	int rc;
+	struct thread_hwbkpt *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+	thbi = alloc_thread_hwbkpt(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	mutex_lock(&hwbkpt_mutex);
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase. */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < HB_NUM - num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hwbkpt(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+
+	mutex_unlock(&hwbkpt_mutex);
+	return rc;
+}
+
+/**
+ * unregister_user_hwbkpt - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp)
+{
+	struct thread_hwbkpt *thbi = tsk->thread.hwbkpt_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hwbkpt_mutex);
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary. */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hwbkpt(tsk);
+
+	mutex_unlock(&hwbkpt_mutex);
+}
+
+/**
+ * modify_user_hwbkpt - modify a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to modify
+ * @address: the new value for @bp->address
+ * @len: the new value for @bp->len
+ * @type: the new value for @bp->type
+ *
+ * @bp need not currently be registered.  If it isn't, the new values
+ * are simply stored in it and @tsk is ignored.  Otherwise the new values
+ * are validated first and then stored.  If @tsk is the current process
+ * and @bp is installed in a debug register, the register is updated.
+ *
+ * Returns 0 if the new values are acceptable, otherwise a negative error
+ * number.
+ */
+int modify_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp,
+		void *address, u8 len, u8 type)
+{
+	unsigned long flags;
+
+	if (!bp->status) {	/* Not registered, just store the values */
+		bp->address = address;
+		bp->len = len;
+		bp->type = type;
+		return 0;
+	}
+
+	/* Check the new values */
+	{
+		struct hwbkpt temp_bp = *bp;
+		int rc;
+
+		temp_bp.address = address;
+		temp_bp.len = len;
+		temp_bp.type = type;
+		rc = validate_settings(&temp_bp, tsk);
+		if (rc)
+			return rc;
+	}
+
+	/* Okay, update the breakpoint.  An interrupt at this point might
+	 * cause I/O to a breakpointed port, so disable interrupts. */
+	mutex_lock(&hwbkpt_mutex);
+	local_irq_save(flags);
+
+	bp->address = address;
+	bp->len = len;
+	bp->type = type;
+	update_user_dr7(tsk->thread.hwbkpt_info);
+
+	/* The priority hasn't changed so we don't need to rebalance
+	 * anything.  Just install the new breakpoint, if necessary. */
+	if (tsk == current)
+		switch_to_thread_hwbkpt(tsk);
+
+	local_irq_restore(flags);
+	mutex_unlock(&hwbkpt_mutex);
+	return 0;
+}
+
+/*
+ * Update the DR7 value for the kernel.
+ */
+static void update_kernel_dr7(void)
+{
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+}
+
+/**
+ * register_kernel_hwbkpt - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hwbkpt(struct hwbkpt *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hwbkpt_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hwbkpt_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hwbkpt);
+
+/**
+ * unregister_kernel_hwbkpt - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hwbkpt(struct hwbkpt *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hwbkpt_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hwbkpt_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hwbkpt);
+
+/*
+ * Handle debug exception notifications.
+ */
+static int hwbkpt_exceptions_notify(struct notifier_block *unused,
+		unsigned long val, void *data)
+{
+	unsigned long dr7;
+	unsigned long dr6;
+	struct pt_regs *regs;
+	struct cpu_hwbkpt *chbi;
+	int i;
+	struct hwbkpt *bp;
+	int rc;
+	u8 type;
+
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions. */
+	get_debugreg(dr7, 7);
+	set_debugreg(0, 7);
+
+	dr6 = ((struct die_args *) data)->err;
+	regs = ((struct die_args *) data)->regs;
+	rc = NOTIFY_STOP;	/* By default, don't send SIGTRAP */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Clear those bits in
+		 * dr6 and perform the belated debug-register switch. */
+		chbi->bp_task = NULL;
+		dr7 &= kdr7_masks[chbi->num_kbps];
+		for (i = chbi->num_kbps; i < HB_NUM; ++i) {
+			dr6 &= ~(1 << i);
+			chbi->bps[i] = NULL;
+		}
+	}
+
+	/* Did the debug exception occur in user space? */
+	if ((regs->eflags & VM_MASK) || user_mode(regs)) {
+
+		/* Was it a single-step exception? */
+		if (dr6 & DR_STEP)
+			rc = NOTIFY_DONE;	/* Do send SIGTRAP */
+	} else {
+
+		/* Did a single-step exception occur in kernel space? */
+		if (dr6 & DR_STEP) {
+
+			/* Avoid single-stepping through a system call */
+			set_tsk_thread_flag(current, TIF_SINGLESTEP);
+			regs->eflags &= ~TF_MASK;
+		}
+	}
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (!(dr6 & (0x1 << i)))
+			continue;
+
+		/* Was this a user breakpoint? */
+		if (i >= chbi->num_kbps)
+			rc = NOTIFY_DONE;	/* Do send SIGTRAP */
+
+		/* If this was an execution breakpoint then set RF in the
+		 * stored eflags, so that when we return the instruction
+		 * will run instead of causing another exception. */
+		type = (dr7 >> (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0x3;
+		if (type == HWBKPT_EXECUTE)
+			regs->eflags |= X86_EFLAGS_RF;
+
+		/* Invoke the triggered callback */
+		bp = chbi->bps[i];
+		if (bp)		/* Should always be non-NULL */
+			(bp->triggered)(bp, regs);
+	}
+	put_cpu_no_resched();
+
+	/* Re-enable the breakpoints */
+	set_debugreg(dr7, 7);
+	return rc;
+}
+
+static struct notifier_block hwbkpt_exceptions_nb = {
+	.notifier_call = hwbkpt_exceptions_notify,
+	.priority = INT_MAX - 1		/* We need to be notified second */
+};
+
+static int __init init_hwbkpt(void)
+{
+	return register_die_notifier(&hwbkpt_exceptions_nb);
+}
+
+core_initcall(init_hwbkpt);
Index: 2.6.21-rc2/arch/i386/kernel/ptrace.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/ptrace.c
+++ 2.6.21-rc2/arch/i386/kernel/ptrace.c
@@ -350,6 +350,88 @@ ptrace_set_thread_area(struct task_struc
 	return 0;
 }
 
+/*
+ * Breakpoint trigger routine.
+ */
+static void ptrace_triggered(struct hwbkpt *bp, struct pt_regs *regs)
+{
+	struct thread_hwbkpt *thbi = bp->data;
+	int i = bp - thbi->ptrace_bps;
+
+	/* Store in the virtual DR6 register the fact that breakpoint i
+	 * was hit, so the thread's debugger will see it. */
+	thbi->vdr6 |= (0x1 << i);
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+	*len = 1 + ((temp >> 2) & 0x3);
+	*type = temp & 0x3;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hwbkpt *thbi, unsigned long data)
+{
+	struct hwbkpt *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints,
+	 * making the appropriate changes to each. */
+restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->ptrace_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint if it should now be disabled.
+		 * Do this first so that setting invalid values for len
+		 * or type won't cause an error. */
+		if (!enabled && bp->status)
+			unregister_user_hwbkpt(tsk, bp);
+
+		/* Insert the breakpoint's settings.  If the bp is enabled,
+		 * an invalid entry will cause an error. */
+		if (modify_user_hwbkpt(tsk, bp, bp->address, len, type) < 0
+				&& rc == 0)
+			break;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will cause an error here. */
+		if (enabled && !bp->status) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HWBKPT_PRIO_PTRACE;
+			bp->data = thbi;
+			if (register_user_hwbkpt(tsk, bp) < 0 && rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
 long arch_ptrace(struct task_struct *child, long request, long addr, long data)
 {
 	struct user * dummy = NULL;
@@ -383,11 +465,22 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
-			addr -= (long) &dummy->u_debugreg[0];
-			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7] &&
+				child->thread.hwbkpt_info) {
+			struct thread_hwbkpt *thbi = child->thread.hwbkpt_info;
+
+			if (thbi) {
+				addr -= (long) &dummy->u_debugreg[0];
+				addr = addr >> 2;
+				if (addr < HB_NUM)
+					tmp = (unsigned long) thbi->
+						ptrace_bps[addr].address;
+				else if (addr == 6)
+					tmp = thbi->vdr6;
+				else if (addr == 7)
+					tmp = thbi->vdr7;
+			}
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -417,59 +510,36 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			struct thread_hwbkpt *thbi;
+
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+
+			/* There are no DR4 or DR5 registers */
+			if (addr == 4 || addr == 5)
+				break;
+			thbi = alloc_thread_hwbkpt(child);
+			if (!thbi)
+				ret = -ENOMEM;
+
+			/* Writes to DR0 - DR3 change a breakpoint address */
+			else if (addr < HB_NUM) {
+				struct hwbkpt *bp = &thbi->ptrace_bps[addr];
+
+				if (modify_user_hwbkpt(child, bp,
+						(void *) data,
+						bp->len, bp->type) >= 0)
+					ret = 0;
+
+			/* Writes to DR6 modify the virtualized value */
+			} else if (addr == 6) {
+				thbi->vdr6 = data;
+				ret = 0;
+
+			} else		/* All that's left is DR7 */
+				ret = ptrace_write_dr7(child, thbi, data);
 		  }
 		  break;
 
@@ -625,7 +695,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: 2.6.21-rc2/arch/i386/kernel/Makefile
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/Makefile
+++ 2.6.21-rc2/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw-breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: 2.6.21-rc2/arch/i386/power/cpu.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/power/cpu.c
+++ 2.6.21-rc2/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -45,6 +46,11 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	/*
+	 * disable the debug registers
+	 */
+	set_debugreg(0, 7);
 }
 
 void save_processor_state(void)
@@ -69,20 +75,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-02 17:19                 ` [RFC] hwbkpt: Hardware breakpoints (was Kwatch) Alan Stern
@ 2007-03-05  7:01                   ` Roland McGrath
  2007-03-05 13:36                     ` Christoph Hellwig
  2007-03-05 17:25                     ` Alan Stern
  0 siblings, 2 replies; 70+ messages in thread
From: Roland McGrath @ 2007-03-05  7:01 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

Thanks, Alan.  Great work.  I have some suggestions for changes.

> 	I pretty much copied the existing code for handling vm86 mode
> 	and single-step exceptions, without fully understanding it.
> 
> 	The code doesn't virtualize the BS (single-step) flag in DR6
> 	for userspace.  It could be added, but I wonder whether it is
> 	really needed.

There is only one TF flag, it and the DR_STEP bit in dr6 are part of the
unitary thread state.  You should not be doing anything at all on
single-step exceptions.

> 	Setting user breakpoints on I/O ports should require permissions
> 	checking.  I haven't tried to figure out how that works or
> 	how to implement it yet.

I would just leave the I/O breakpoint feature out for the first cut.  
See if there is demand for it.  

It requires setting CR4.DE, which we don't do.  The validation is
relatively simple, comparing against the io_bitmap_ptr data (if any).
You can check at insertion time (requiring doing ioperm before inserting
an I/O breakpoint), and then check again at exception time if the hit
was in kernel mode and ignore it for a user-only breakpoint, or perhaps
check again and eject the breakpoint if it's been cleared from the
ioperm bitmap.

> 	It seems likely that some of the new routines should be marked
> 	"__kprobes", but I don't know which, or even what that annotation
> 	is supposed to mean.

That annotation is for code that is in the kprobes maintenance code path
(where you cannot insert kprobes).  do_debug has it for the notify_die
path that might lead to kprobes.  The only thing in your code that
should need it is your notifier function, which might get called for an
exception that turns out to be a kprobes single-step.  There shouldn't
be any problem inserting kprobes into the other hwbkpt code.

> 	The parts relating to kernel breakpoints could be made conditional
> 	on a Kconfig option.  The amount of code space saved would be
> 	relatively small; I'm not sure that it would be worthwhile.

In a utrace merge, the user parts can be made conditional on CONFIG_UTRACE.
Then with both turned off, the code goes away completely.  It's unlikely it
will ever be turned off, but it is a clean way to go about things in case
someone wants the smallest possible config for a limited-use installation.

> +	void		(*installed)(struct hwbkpt *);
> +	void		(*uninstalled)(struct hwbkpt *);

Save space in the struct by having just one function for both installed
and uninstalled, taking an argument.  Probably a caller should be able to
pass a null function here to say that the registration call should fail if
it can't be installed due to higher-priority or no-callback registrations
existing, and that its registration cannot be ejected by another (i.e., an
ill-behaved user).

> +	void		*data;

Leave this out.  Let the caller embed struct hwbkpt in his larger struct
and use container_of.

> +/*
> + * The tsk argument in the following three routines will usually be a
> + * process being PTRACEd by the current task, normally a debugger.
> + * It is also legal for tsk to be the current task.  In either case we
> + * can guarantee that tsk will not start running on another CPU while
> + * its breakpoints are being modified.  If that happened it could cause
> + * a crash.
> + */
> +int register_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
> +void unregister_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
> +int modify_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp,
> +		void *address, u8 len, u8 type);

These are the kinds of constraints utrace is there to make doable.
(I'm assuming that guarantee is really "will not start running in
user mode", since SIGKILL is always possible.)  In a utrace merge,
I think we want the only entry points for this to be a utrace-based
interface that explicitly requires utrace's notion of quiescence
(as its accessors for thread register data do).

> +/*
> + * Kernel breakpoints are not associated with any particular thread.
> + */
> +int register_kernel_hwbkpt(struct hwbkpt *bp);
> +void unregister_kernel_hwbkpt(struct hwbkpt *bp);

Potentially kernel users might want per-thread installations, if
it's simple to provide the option.  e.g., a probe at some syscall
entry that installs a watchpoint while calling into complex
subsystems, and then removes it before returning.

> @@ -379,15 +381,15 @@ void exit_thread(void)
>  		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
>  		put_cpu();
>  	}
> +	flush_thread_hwbkpt(tsk);

I'd make this do if (unlikely(test_tsk_thread_flag(tsk, TIF_DEBUG))),
or make it an inline doing that test before calling out to the guts.
Most of the time the hwbkpt code need not be on a hot path, and the
exit path will be one.

> +struct thread_hwbkpt {		/* HW breakpoint info for a thread */
> +
> +	/* utrace support */
> +	struct list_head	node;		/* Entry in thread list */
> +	struct list_head	thread_bps;	/* Thread's breakpoints */
> +	unsigned long		tdr7;		/* Thread's DR7 value */
> +
> +	/* ptrace support */
> +	struct hwbkpt		ptrace_bps[HB_NUM];

I wouldn't use ptrace in the name, it's "the thread's virtual state".
You shouldn't need a struct hwbpkt here really, just unsigned long vdr[4].

> +		/* Check whether bp->address points to user space */
> +		if ((tsk != NULL) != ((unsigned long) bp->address < TASK_SIZE))

On x86_64, make sure this is TASK_SIZE_OF(tsk).

> +	if (val != DIE_DEBUG)
> +		return NOTIFY_DONE;

The very next thing should be:

	dr6 = ((struct die_args *) data)->err;
	if ((dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) == 0)
		return NOTIFY_DONE;
	if (unlikely(regs->eflags & VM_MASK))
		return NOTIFY_DONE;

You are not involved at all if the exception was not produced by a debug
register setting.  When no notify_die hook returns NOTIFY_STOP, do_debug
should do thread->vdr6 |= dr6.  (In practice this will only be the DR_STEP
bit, but maybe others will come along in later chips.  If you're not
explicitly virtualizing something, you should be passing it through.)
To avoid this needing to do allocation when TIF_DEBUG was not already set,
probably the vdr6 should stay in thread_struct directly alongside the pointer.

> +		if (!(dr6 & (0x1 << i)))
> +			continue;

Use (DR_TRAP0 << i).

> +static struct notifier_block hwbkpt_exceptions_nb = {
> +	.notifier_call = hwbkpt_exceptions_notify,
> +	.priority = INT_MAX - 1		/* We need to be notified second */

The order shouldn't matter if everyone only swallows their own exceptions.
kprobes needs to be a little better behaved this way:

diff --git a/arch/i386/kernel/kprobes.c b/arch/i386/kernel/kprobes.c
index b545bc7..0000000 100644  
--- a/arch/i386/kernel/kprobes.c
+++ b/arch/i386/kernel/kprobes.c
@@ -670,7 +670,7 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs))
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_GPF:
diff --git a/arch/x86_64/kernel/kprobes.c b/arch/x86_64/kernel/kprobes.c
index 209c8c0..0000000 100644  
--- a/arch/x86_64/kernel/kprobes.c
+++ b/arch/x86_64/kernel/kprobes.c
@@ -661,7 +661,7 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs))
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_GPF:

> Index: 2.6.21-rc2/arch/i386/kernel/ptrace.c
> ===================================================================
> --- 2.6.21-rc2.orig/arch/i386/kernel/ptrace.c
> +++ 2.6.21-rc2/arch/i386/kernel/ptrace.c

I would like all this stuff to be in hw-breakpoint.c, and just called with:

    int thread_set_debugreg(struct task_struct *, int n, unsigned long val);
    unsigned long thread_get_debugreg(struct task_struct *, int n);

Writes to 0..3,6 can just write the slot (after allocating the thread's
struct if need be, don't bother if writing 0), and then act like a reset of
dr7 to its existing virtual value if that includes the corresponding bit.
(This lets the thread's virtual dr[0-3] contain addresses even when
breakpoints are not enabled.)  

I did not check all the code for correctness outside the issues that
concerned me off hand, so I may have more comments on another rev in
parts that I didn't mention here.

Thanks,
Roland

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05  7:01                   ` Roland McGrath
@ 2007-03-05 13:36                     ` Christoph Hellwig
  2007-03-05 16:16                       ` Alan Stern
  2007-03-05 17:25                     ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Christoph Hellwig @ 2007-03-05 13:36 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Alan Stern, Prasanna S Panchamukhi, Kernel development list

On Sun, Mar 04, 2007 at 11:01:36PM -0800, Roland McGrath wrote:
> > 	The parts relating to kernel breakpoints could be made conditional
> > 	on a Kconfig option.  The amount of code space saved would be
> > 	relatively small; I'm not sure that it would be worthwhile.
> 
> In a utrace merge, the user parts can be made conditional on CONFIG_UTRACE.
> Then with both turned off, the code goes away completely.  It's unlikely it
> will ever be turned off, but it is a clean way to go about things in case
> someone wants the smallest possible config for a limited-use installation.

Making this unconditional is pointless and just makes things harder to
read, so please don't do it.  (The same is true for utrace, but Roland
has unfortunately still not replied to my mail mentioning it :P)

> > +	void		(*installed)(struct hwbkpt *);
> > +	void		(*uninstalled)(struct hwbkpt *);
> 
> Save space in the struct by having just one function for both installed
> and uninstalled, taking an argument.  Probably a caller should be able to
> pass a null function here to say that the registration call should fail if
> it can't be installed due to higher-priority or no-callback registrations
> existing, and that its registration cannot be ejected by another (i.e., an
> ill-behaved user).

Please not.  That might save a few bytes, but it makes the interface a
lot harder to understand for users.  We really discourage over-loaded
interfaces in Linux.

> > +struct thread_hwbkpt {		/* HW breakpoint info for a thread */

Can this and all the file names please get an actually readable name?
E.g. hw_breakpoint.  We're not IBM managers that needs to save every
cent on superflous sillables :)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05 13:36                     ` Christoph Hellwig
@ 2007-03-05 16:16                       ` Alan Stern
  2007-03-05 16:49                         ` Christoph Hellwig
  2007-03-05 22:04                         ` Roland McGrath
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-03-05 16:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Roland McGrath, Prasanna S Panchamukhi, Kernel development list

On Mon, 5 Mar 2007, Christoph Hellwig wrote:

> On Sun, Mar 04, 2007 at 11:01:36PM -0800, Roland McGrath wrote:
> > > 	The parts relating to kernel breakpoints could be made conditional
> > > 	on a Kconfig option.  The amount of code space saved would be
> > > 	relatively small; I'm not sure that it would be worthwhile.
> > 
> > In a utrace merge, the user parts can be made conditional on CONFIG_UTRACE.
> > Then with both turned off, the code goes away completely.  It's unlikely it
> > will ever be turned off, but it is a clean way to go about things in case
> > someone wants the smallest possible config for a limited-use installation.
> 
> Making this unconditional is pointless and just makes things harder to
> read, so please don't do it.  (The same is true for utrace, but Roland
> has unfortunately still not replied to my mail mentioning it :P)

Sorry, I don't understand what you're saying.  I would think that making
it _conditional_ would make things harder to read, because of all the
extra "#ifdef" and "#endif" lines plus the need to keep two different
versions of the code in mind.

Did you mean to say "conditional" instead of "unconditional"?

Incidentally, I do believe that for certain applications (embedded
devices, for instance) it makes sense to avoid including all this code.  
The cleanest way to do that would be to make both PTRACE and UTRACE
configurable.


> > > +	void		(*installed)(struct hwbkpt *);
> > > +	void		(*uninstalled)(struct hwbkpt *);
> > 
> > Save space in the struct by having just one function for both installed
> > and uninstalled, taking an argument.  Probably a caller should be able to
> > pass a null function here to say that the registration call should fail if
> > it can't be installed due to higher-priority or no-callback registrations
> > existing, and that its registration cannot be ejected by another (i.e., an
> > ill-behaved user).
> 
> Please not.  That might save a few bytes, but it makes the interface a
> lot harder to understand for users.  We really discourage over-loaded
> interfaces in Linux.

I agree with Christoph.  Plenty of other interfaces in the kernel do the 
same thing.


> > > +struct thread_hwbkpt {		/* HW breakpoint info for a thread */
> 
> Can this and all the file names please get an actually readable name?
> E.g. hw_breakpoint.  We're not IBM managers that needs to save every
> cent on superflous sillables :)

I'll be happy to rename the structures and the files if Roland doesn't 
mind.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05 16:16                       ` Alan Stern
@ 2007-03-05 16:49                         ` Christoph Hellwig
  2007-03-05 22:04                         ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: Christoph Hellwig @ 2007-03-05 16:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: Christoph Hellwig, Roland McGrath, Prasanna S Panchamukhi,
	5B 5B Kernel development list

On Mon, Mar 05, 2007 at 11:16:48AM -0500, Alan Stern wrote:
> > Making this unconditional is pointless and just makes things harder to
> > read, so please don't do it.  (The same is true for utrace, but Roland
> > has unfortunately still not replied to my mail mentioning it :P)
> 
> Sorry, I don't understand what you're saying.  I would think that making
> it _conditional_ would make things harder to read, because of all the
> extra "#ifdef" and "#endif" lines plus the need to keep two different
> versions of the code in mind.
> 
> Did you mean to say "conditional" instead of "unconditional"?

Yes, I did mean that.  Sorry for the confusion :)

> Incidentally, I do believe that for certain applications (embedded
> devices, for instance) it makes sense to avoid including all this code.  
> The cleanest way to do that would be to make both PTRACE and UTRACE
> configurable.

PTRACE configurable makes a lot of sense, especially as we want to get
rid of it very long term.  Making UTRACE configurable aswel as all
these tracehooks wrappers just make the code utterly unreadable.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05  7:01                   ` Roland McGrath
  2007-03-05 13:36                     ` Christoph Hellwig
@ 2007-03-05 17:25                     ` Alan Stern
  2007-03-06  3:13                       ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-05 17:25 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Sun, 4 Mar 2007, Roland McGrath wrote:

> Thanks, Alan.  Great work.  I have some suggestions for changes.
> 
> > 	I pretty much copied the existing code for handling vm86 mode
> > 	and single-step exceptions, without fully understanding it.
> > 
> > 	The code doesn't virtualize the BS (single-step) flag in DR6
> > 	for userspace.  It could be added, but I wonder whether it is
> > 	really needed.
> 
> There is only one TF flag, it and the DR_STEP bit in dr6 are part of the
> unitary thread state.  You should not be doing anything at all on
> single-step exceptions.

Presumably you mean that hw-breakpoint.c shouldn't do anything at all on
single-step exceptions.  do_debug does have to know about them, because it
has to know whether or not to send a SIGTRAP.  Separation of duties; I
tried to move too much out of do_debug.


> > 	Setting user breakpoints on I/O ports should require permissions
> > 	checking.  I haven't tried to figure out how that works or
> > 	how to implement it yet.
> 
> I would just leave the I/O breakpoint feature out for the first cut.  
> See if there is demand for it.  

Good enough.

> It requires setting CR4.DE, which we don't do.  The validation is
> relatively simple, comparing against the io_bitmap_ptr data (if any).
> You can check at insertion time (requiring doing ioperm before inserting
> an I/O breakpoint), and then check again at exception time if the hit
> was in kernel mode and ignore it for a user-only breakpoint, or perhaps
> check again and eject the breakpoint if it's been cleared from the
> ioperm bitmap.

It's easier simply to ignore the whole issue.  :-)  Fortunately many other 
architectures don't have to worry about it at all.


> > 	It seems likely that some of the new routines should be marked
> > 	"__kprobes", but I don't know which, or even what that annotation
> > 	is supposed to mean.
> 
> That annotation is for code that is in the kprobes maintenance code path
> (where you cannot insert kprobes).  do_debug has it for the notify_die
> path that might lead to kprobes.  The only thing in your code that
> should need it is your notifier function, which might get called for an
> exception that turns out to be a kprobes single-step.  There shouldn't
> be any problem inserting kprobes into the other hwbkpt code.

Got it.


> > 	The parts relating to kernel breakpoints could be made conditional
> > 	on a Kconfig option.  The amount of code space saved would be
> > 	relatively small; I'm not sure that it would be worthwhile.
> 
> In a utrace merge, the user parts can be made conditional on CONFIG_UTRACE.
> Then with both turned off, the code goes away completely.  It's unlikely it
> will ever be turned off, but it is a clean way to go about things in case
> someone wants the smallest possible config for a limited-use installation.

So far I've been developing under 2.6.21-rc, which doesn't have utrace.  
But eventually this will be submitted by way of -mm, which does.  The
easiest approach would be to make the whole thing conditional on 
CONFIG_UTRACE.


> > +	void		*data;
> 
> Leave this out.  Let the caller embed struct hwbkpt in his larger struct
> and use container_of.

Okay.


> > +/*
> > + * The tsk argument in the following three routines will usually be a
> > + * process being PTRACEd by the current task, normally a debugger.
> > + * It is also legal for tsk to be the current task.  In either case we
> > + * can guarantee that tsk will not start running on another CPU while
> > + * its breakpoints are being modified.  If that happened it could cause
> > + * a crash.
> > + */
> > +int register_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
> > +void unregister_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp);
> > +int modify_user_hwbkpt(struct task_struct *tsk, struct hwbkpt *bp,
> > +		void *address, u8 len, u8 type);
> 
> These are the kinds of constraints utrace is there to make doable.
> (I'm assuming that guarantee is really "will not start running in
> user mode", since SIGKILL is always possible.)

The actual guarantee I need is that nobody will switch_to() the task while
my routines are running.

>  In a utrace merge,
> I think we want the only entry points for this to be a utrace-based
> interface that explicitly requires utrace's notion of quiescence
> (as its accessors for thread register data do).

Yes; I had in mind that the user-breakpoint part would be called only by 
utrace and/or ptrace.  That's why the routines aren't EXPORTed.  But I 
wrote it in more general form, to make it easier to understand.  I'll add 
some comments about this.


> > +/*
> > + * Kernel breakpoints are not associated with any particular thread.
> > + */
> > +int register_kernel_hwbkpt(struct hwbkpt *bp);
> > +void unregister_kernel_hwbkpt(struct hwbkpt *bp);
> 
> Potentially kernel users might want per-thread installations, if
> it's simple to provide the option.  e.g., a probe at some syscall
> entry that installs a watchpoint while calling into complex
> subsystems, and then removes it before returning.

If someone really needs to do that, they can always put their own call to
(un)register_kernel_hwbkpt() at the entry(exit) to the complex subsystems.  
Or perhaps it should be a job for systemtap, which would use hwbkpt to do
the actual work.


> > @@ -379,15 +381,15 @@ void exit_thread(void)
> >  		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
> >  		put_cpu();
> >  	}
> > +	flush_thread_hwbkpt(tsk);
> 
> I'd make this do if (unlikely(test_tsk_thread_flag(tsk, TIF_DEBUG))),

That's the wrong test, but I get your point.

> or make it an inline doing that test before calling out to the guts.
> Most of the time the hwbkpt code need not be on a hot path, and the
> exit path will be one.

Not nearly as hot as switch_to()!  But I'll do it.


> > +struct thread_hwbkpt {		/* HW breakpoint info for a thread */
> > +
> > +	/* utrace support */
> > +	struct list_head	node;		/* Entry in thread list */
> > +	struct list_head	thread_bps;	/* Thread's breakpoints */
> > +	unsigned long		tdr7;		/* Thread's DR7 value */
> > +
> > +	/* ptrace support */
> > +	struct hwbkpt		ptrace_bps[HB_NUM];
> 
> I wouldn't use ptrace in the name, it's "the thread's virtual state".

That may be so, but the only way to access that part of the state is via
ptrace.  Think of it this way: The debug register settings really should
not be part of the thread's virtual state.  If we had some other, more
logical API for managing breakpoints in a task then ptrace_bps[] wouldn't
be necessary at all (other than for backward compatibility perhaps).

> You shouldn't need a struct hwbpkt here really, just unsigned long vdr[4].

That would make things much more complicated.  It's a lot easier to treat 
all breakpoints as though they are the same under the hood.

Besides, the extra memory usage will show up only on threads that are 
being debugged, which means we can safely ignore it.


> > +		/* Check whether bp->address points to user space */
> > +		if ((tsk != NULL) != ((unsigned long) bp->address < TASK_SIZE))
> 
> On x86_64, make sure this is TASK_SIZE_OF(tsk).

Another thing to watch out for is that we would have to allow length-8
breakpoints in 64-bit mode.  I'll add a few comments.


> > +	if (val != DIE_DEBUG)
> > +		return NOTIFY_DONE;
> 
> The very next thing should be:
> 
> 	dr6 = ((struct die_args *) data)->err;
> 	if ((dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) == 0)
> 		return NOTIFY_DONE;
> 	if (unlikely(regs->eflags & VM_MASK))
> 		return NOTIFY_DONE;
> 
> You are not involved at all if the exception was not produced by a debug
> register setting.  When no notify_die hook returns NOTIFY_STOP, do_debug
> should do thread->vdr6 |= dr6.  (In practice this will only be the DR_STEP
> bit, but maybe others will come along in later chips.  If you're not
> explicitly virtualizing something, you should be passing it through.)

I get the picture.

> To avoid this needing to do allocation when TIF_DEBUG was not already set,
> probably the vdr6 should stay in thread_struct directly alongside the pointer.

Good idea.


> > +		if (!(dr6 & (0x1 << i)))
> > +			continue;
> 
> Use (DR_TRAP0 << i).

Okay.


> > +static struct notifier_block hwbkpt_exceptions_nb = {
> > +	.notifier_call = hwbkpt_exceptions_notify,
> > +	.priority = INT_MAX - 1		/* We need to be notified second */
> 
> The order shouldn't matter if everyone only swallows their own exceptions.

True.  .priority can take on its default value of 0.

> kprobes needs to be a little better behaved this way:
> 
> diff --git a/arch/i386/kernel/kprobes.c b/arch/i386/kernel/kprobes.c
> index b545bc7..0000000 100644  
> --- a/arch/i386/kernel/kprobes.c
> +++ b/arch/i386/kernel/kprobes.c
> @@ -670,7 +670,7 @@ int __kprobes kprobe_exceptions_notify(s
>  			ret = NOTIFY_STOP;
>  		break;
>  	case DIE_DEBUG:
> -		if (post_kprobe_handler(args->regs))
> +		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs))
>  			ret = NOTIFY_STOP;
>  		break;
>  	case DIE_GPF:
> diff --git a/arch/x86_64/kernel/kprobes.c b/arch/x86_64/kernel/kprobes.c
> index 209c8c0..0000000 100644  
> --- a/arch/x86_64/kernel/kprobes.c
> +++ b/arch/x86_64/kernel/kprobes.c
> @@ -661,7 +661,7 @@ int __kprobes kprobe_exceptions_notify(s
>  			ret = NOTIFY_STOP;
>  		break;
>  	case DIE_DEBUG:
> -		if (post_kprobe_handler(args->regs))
> +		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs))
>  			ret = NOTIFY_STOP;
>  		break;
>  	case DIE_GPF:

That doesn't seem quite right.  What if you encounter _both_ a single-step 
exception and a breakpoint at the same time?  Probably kprobes should 
always return NOTIFY_DONE.

Which implies that do_debug needs to decide whether or not to issue 
SIGTRAP.  Presumably the condition will be that any of the DR_STEP or 
DR_TRAPn bits remain set after the notifier chain has run.  This means the 
kprobes code will have to be modified to clear DR_STEP in args->err.


> > Index: 2.6.21-rc2/arch/i386/kernel/ptrace.c
> > ===================================================================
> > --- 2.6.21-rc2.orig/arch/i386/kernel/ptrace.c
> > +++ 2.6.21-rc2/arch/i386/kernel/ptrace.c
> 
> I would like all this stuff to be in hw-breakpoint.c, and just called with:
> 
>     int thread_set_debugreg(struct task_struct *, int n, unsigned long val);
>     unsigned long thread_get_debugreg(struct task_struct *, int n);

Easy enough.  I'll just move the code from one file to the other.

> Writes to 0..3,6 can just write the slot (after allocating the thread's
> struct if need be, don't bother if writing 0), and then act like a reset of
> dr7 to its existing virtual value if that includes the corresponding bit.
> (This lets the thread's virtual dr[0-3] contain addresses even when
> breakpoints are not enabled.)  

That's essentially what the patch does do.


> I did not check all the code for correctness outside the issues that
> concerned me off hand, so I may have more comments on another rev in
> parts that I didn't mention here.

Fair enough.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05 16:16                       ` Alan Stern
  2007-03-05 16:49                         ` Christoph Hellwig
@ 2007-03-05 22:04                         ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-03-05 22:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Christoph Hellwig, Prasanna S Panchamukhi, Kernel development list

> > Please not.  That might save a few bytes, but it makes the interface a
> > lot harder to understand for users.  We really discourage over-loaded
> > interfaces in Linux.
> 
> I agree with Christoph.  Plenty of other interfaces in the kernel do the 
> same thing.

I don't think a single hook for both "on" and "off" notification is harder
to understand at all, it's natural.  But this is a very tiny point.

> I'll be happy to rename the structures and the files if Roland doesn't 
> mind.

I didn't much like "debugpoint" but I'm not wedded to vowellessness.
hw_breakpoint is fine by me.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-05 17:25                     ` Alan Stern
@ 2007-03-06  3:13                       ` Roland McGrath
  2007-03-06 15:23                         ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-03-06  3:13 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> Presumably you mean that hw-breakpoint.c shouldn't do anything at all on
> single-step exceptions.  

Right.

> So far I've been developing under 2.6.21-rc, which doesn't have utrace.
> But eventually this will be submitted by way of -mm, which does.  The
> easiest approach would be to make the whole thing conditional on
> CONFIG_UTRACE.

That is fine with me.

> The actual guarantee I need is that nobody will switch_to() the task while
> my routines are running.

You can't get that.  It can always be woken for SIGKILL (which is a good
thing).  What you are guaranteed is that if it does, it will never return
to user mode.  So it has to be ok for switching in to use the bits in any
intermediate state you might get them, meaning any possible garbage state
is harmful only to user mode or is otherwise recoverable (worst case
perhaps the exception handler has to know to ignore some traps).  This is
already true with ptrace and ->thread.debugreg, as well as the normal user
registers.  In your case, if you wanted to be paranoid you could clear
TIF_DEBUG before you touch anything, and set it again only after you're
done (with memory barriers as needed).

> If someone really needs to do that, they can always put their own call to
> (un)register_kernel_hwbkpt() at the entry(exit) to the complex subsystems.  
> Or perhaps it should be a job for systemtap, which would use hwbkpt to do
> the actual work.

But you don't have an option to avoid interrupting other CPUs to update,
which is not necessary or desireable for this usage.  That's what I was
referring to.  If it's not trivial to add, it isn't needed now.

> Not nearly as hot as switch_to()!  But I'll do it.

That's why it's got a cheap TIF_DEBUG check with unlikely().

> That may be so, but the only way to access that part of the state is via
> ptrace.  Think of it this way: The debug register settings really should
> not be part of the thread's virtual state.  If we had some other, more
> logical API for managing breakpoints in a task then ptrace_bps[] wouldn't
> be necessary at all (other than for backward compatibility perhaps).

As things are in utrace, there will continue to be a utrace method of
setting the (virtual) "raw" debugregs, even if ptrace per se is not involved.
(So all I'm saying really is I'm on a personal campaign against the letter P.)

OTOH, your point is well taken.  Once your stuff is integrated, there is no
real reason that thread-virtualized "raw" debug registers need to be
accessible via utrace_regset.  Perhaps I should drop it.  Then those calls
will be used purely by ptrace compatibility and can be #ifdef CONFIG_PTRACE.

> Which implies that do_debug needs to decide whether or not to issue 
> SIGTRAP.  Presumably the condition will be that any of the DR_STEP or 
> DR_TRAPn bits remain set after the notifier chain has run.  This means the 
> kprobes code will have to be modified to clear DR_STEP in args->err.

Yeah, I guess that's right.  It should still return NOTIFY_STOP when
args->err has no other bits set, so notifiers aren't called with zero.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-06  3:13                       ` Roland McGrath
@ 2007-03-06 15:23                         ` Alan Stern
  2007-03-07  3:49                           ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-06 15:23 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Mon, 5 Mar 2007, Roland McGrath wrote:

> > The actual guarantee I need is that nobody will switch_to() the task while
> > my routines are running.
> 
> You can't get that.  It can always be woken for SIGKILL (which is a good
> thing).  What you are guaranteed is that if it does, it will never return
> to user mode.  So it has to be ok for switching in to use the bits in any
> intermediate state you might get them, meaning any possible garbage state
> is harmful only to user mode or is otherwise recoverable (worst case
> perhaps the exception handler has to know to ignore some traps).  This is
> already true with ptrace and ->thread.debugreg, as well as the normal user
> registers.  In your case, if you wanted to be paranoid you could clear
> TIF_DEBUG before you touch anything, and set it again only after you're
> done (with memory barriers as needed).

Guess I'll have to take that approach.  The new additions to __switch_to() 
follow a linked list, and obviously it's not safe to do that while the 
list is being updated.  (No, I'm not about to start using RCU for this!)


> > If someone really needs to do that, they can always put their own call to
> > (un)register_kernel_hwbkpt() at the entry(exit) to the complex subsystems.  
> > Or perhaps it should be a job for systemtap, which would use hwbkpt to do
> > the actual work.
> 
> But you don't have an option to avoid interrupting other CPUs to update,
> which is not necessary or desireable for this usage.  That's what I was
> referring to.  If it's not trivial to add, it isn't needed now.

I see your point.  If you want to monitor a certain location in kernel
space but only while in the context of a specific task, the new framework
doesn't provide any way of doing it.  One approach could be to use a
regular kernel breakpoint and return immediately if the task of interest
isn't the current one.  Beyond that, let's punt.


> > Which implies that do_debug needs to decide whether or not to issue 
> > SIGTRAP.  Presumably the condition will be that any of the DR_STEP or 
> > DR_TRAPn bits remain set after the notifier chain has run.  This means the 
> > kprobes code will have to be modified to clear DR_STEP in args->err.
> 
> Yeah, I guess that's right.  It should still return NOTIFY_STOP when
> args->err has no other bits set, so notifiers aren't called with zero.

In practice that might not work.  On my machine, at least, reads of DR6
return ones in all the reserved bit positions.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-06 15:23                         ` Alan Stern
@ 2007-03-07  3:49                           ` Roland McGrath
  2007-03-07 19:11                             ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-03-07  3:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> > Yeah, I guess that's right.  It should still return NOTIFY_STOP when
> > args->err has no other bits set, so notifiers aren't called with zero.
> 
> In practice that might not work.  On my machine, at least, reads of DR6
> return ones in all the reserved bit positions.

Does that mean asm("mov %1,%%dr6; mov %%dr6,%0" : "=r" (mask) : "r" (0)); 
puts in mask the set of reserved bits?  We could collect that value at CPU
startup and mask it off args->err, then OR it back into vdr6.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-07  3:49                           ` Roland McGrath
@ 2007-03-07 19:11                             ` Alan Stern
  2007-03-09  6:52                               ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-07 19:11 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 6 Mar 2007, Roland McGrath wrote:

> > > Yeah, I guess that's right.  It should still return NOTIFY_STOP when
> > > args->err has no other bits set, so notifiers aren't called with zero.
> > 
> > In practice that might not work.  On my machine, at least, reads of DR6
> > return ones in all the reserved bit positions.
> 
> Does that mean asm("mov %1,%%dr6; mov %%dr6,%0" : "=r" (mask) : "r" (0)); 
> puts in mask the set of reserved bits?  We could collect that value at CPU
> startup and mask it off args->err, then OR it back into vdr6.

That sounds like a rather fragile approach to avoiding a minimal amount of 
work.  Debug exceptions don't occur very often, and when they do it won't 
matter too much if we go through some extra notifier-chain callouts.

Back to a previous topic:

> > The actual guarantee I need is that nobody will switch_to() the task while
> > my routines are running.
>
> You can't get that.  It can always be woken for SIGKILL (which is a good
> thing).  What you are guaranteed is that if it does, it will never return
> to user mode.  So it has to be ok for switching in to use the bits in any
> intermediate state you might get them, meaning any possible garbage state
> is harmful only to user mode or is otherwise recoverable (worst case
> perhaps the exception handler has to know to ignore some traps).  This is
> already true with ptrace and ->thread.debugreg, as well as the normal user
> registers.  In your case, if you wanted to be paranoid you could clear
> TIF_DEBUG before you touch anything, and set it again only after you're
> done (with memory barriers as needed).

It turns out that this won't work correctly unless I use something
stronger, like a spinlock or RCU.  Either one seems like overkill.

Is there any way to find out from within the
switch_to_thread_hw_breakpoint routine whether the task is in this unusual
state?  (By which I mean the task is being debugged and the debugger
hasn't told it to start running.)  Would (tsk->exit_code == SIGKILL) work?  
If not, can we add a TIF_DEBUG_STOPPED flag?  Or should I just go with a 
spinlock?

Is SIGKILL the only way this can happen?

In a similar vein, I need a reliable way to know whether a task has gone 
through exit_thread().  If it has, then its hw_breakpoint area has been 
deallocated and a new one must not be allocated.  Will (tsk->flags & 
PF_EXITING) always be true once that happens?

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-07 19:11                             ` Alan Stern
@ 2007-03-09  6:52                               ` Roland McGrath
  2007-03-09 18:40                                 ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-03-09  6:52 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> That sounds like a rather fragile approach to avoiding a minimal amount of 
> work.  Debug exceptions don't occur very often, and when they do it won't 
> matter too much if we go through some extra notifier-chain callouts.

When single-stepping occurs it happens repeatedly many times, and that
doesn't need any more overhead than it already has.  

> It turns out that this won't work correctly unless I use something
> stronger, like a spinlock or RCU.  Either one seems like overkill.

What is the problem with just clearing TIF_DEBUG?  It just means that in
the SIGKILL case, the dying thread won't switch in its local debugregs.
The global kernel allocations will already be set in the processor from the
previous context, and old user-address allocations do no harm since we
won't run in user mode again before switching out at the end of do_exit.

> Is there any way to find out from within the
> switch_to_thread_hw_breakpoint routine whether the task is in this unusual
> state?  (By which I mean the task is being debugged and the debugger
> hasn't told it to start running.)  Would (tsk->exit_code == SIGKILL) work?  

That won't necessarily work.  There isn't any cheap check that won't also
catch a task preempted on its way to stopping for the debugger.  

> If not, can we add a TIF_DEBUG_STOPPED flag?  

I'm not clear on what that would mean, but it's probably not an idea I like.

> Or should I just go with a spinlock?

If it's really necessary, but it hasn't proved so for any other switched
per-thread state.  As long as you aren't doing per-thread kernel-mode
allocations, I don't see why you need anything other than TIF_DEBUG.

> Is SIGKILL the only way this can happen?

It should be, but there might be some stray wake_up_process calls in the
kernel that can violate [up]trace's supposed monopoly on TASK_TRACED (or
duopoly with SIGKILL, I suppose I should say).  If there is no SIGKILL,
then the task will just call schedule again nearly immediately to go back
to blocking, which will switch out unless there is a second wakeup right
then.

> In a similar vein, I need a reliable way to know whether a task has gone 
> through exit_thread().  If it has, then its hw_breakpoint area has been 
> deallocated and a new one must not be allocated.  Will (tsk->flags & 
> PF_EXITING) always be true once that happens?

PF_EXITING it set after there is no possibility of returning to user mode,
but a while before exit_thread, when you might still want kernel-mode
breakpoints.  If the only per-thread allocations you support are for user
mode, then you can certainly refuse to do any when PF_EXITING is set.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-09  6:52                               ` Roland McGrath
@ 2007-03-09 18:40                                 ` Alan Stern
  2007-03-13  8:00                                   ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-09 18:40 UTC (permalink / raw)
  To: Roland McGrath, Christoph Hellwig
  Cc: Prasanna S Panchamukhi, Kernel development list

On Thu, 8 Mar 2007, Roland McGrath wrote:

> > That sounds like a rather fragile approach to avoiding a minimal amount of 
> > work.  Debug exceptions don't occur very often, and when they do it won't 
> > matter too much if we go through some extra notifier-chain callouts.
> 
> When single-stepping occurs it happens repeatedly many times, and that
> doesn't need any more overhead than it already has.  

Well, I can add in the test for 0, but finding the set of always-on bits
in DR6 will have to be done separately.  Isn't it possible that different
CPUs could have different bits?


> > It turns out that this won't work correctly unless I use something
> > stronger, like a spinlock or RCU.  Either one seems like overkill.
> 
> What is the problem with just clearing TIF_DEBUG?  It just means that in
> the SIGKILL case, the dying thread won't switch in its local debugregs.
> The global kernel allocations will already be set in the processor from the
> previous context, and old user-address allocations do no harm since we
> won't run in user mode again before switching out at the end of do_exit.

Here's what might happen.  Let's say T is being debugged by D, when
somebody sends T a SIGKILL.  CPU 0 does a __switch_to() to take care of
terminating T, and since TIF_DEBUG is still set, it runs the
switch_to_thread_hw_breakpoint() routine.  The routine begins to install 
the debug registers for T.

At that moment D, running on CPU 1, decides to unregister a breakpoint in
T.  Clearing TIF_DEBUG now doesn't do any good -- it's too late; CPU 0 has
already tested it.  CPU 1 goes in and alters the user breakpoint data,
maybe even deallocating a breakpoint structure that CPU 0 is about to
read.  Not a good situation.

What I need is some way for CPU 0 to know from the start that the current 
task is about to die, so it can avoid installing the user breakpoints 
altogether.  Or else there has to be some sort of mutual exclusion so that 
the two CPUs don't try to access the breakpoint data at the same time.


> > Is there any way to find out from within the
> > switch_to_thread_hw_breakpoint routine whether the task is in this unusual
> > state?  (By which I mean the task is being debugged and the debugger
> > hasn't told it to start running.)  Would (tsk->exit_code == SIGKILL) work?  
> 
> That won't necessarily work.  There isn't any cheap check that won't also
> catch a task preempted on its way to stopping for the debugger.  

No way to tell when a task being debugged is started up by anything other 
than its debugger?  Hmmm, in that case maybe it would be better to use 
RCU.  It won't add much overhead to anything but the code for registering 
and unregistering user breakpoints.


> > In a similar vein, I need a reliable way to know whether a task has gone 
> > through exit_thread().  If it has, then its hw_breakpoint area has been 
> > deallocated and a new one must not be allocated.  Will (tsk->flags & 
> > PF_EXITING) always be true once that happens?
> 
> PF_EXITING it set after there is no possibility of returning to user mode,
> but a while before exit_thread, when you might still want kernel-mode
> breakpoints.  If the only per-thread allocations you support are for user
> mode, then you can certainly refuse to do any when PF_EXITING is set.

Okay, that's easy enough.


On Fri, 9 Mar 2007, Christoph Hellwig wrote:

> btw, can you repost a snapshot patch with your?  I seem to have lost
> were we're standing code-wise currently.  Also do you have some examples
> we can put in now?  If not I'd like to hack some up based on that snapshot.

Below is the current version of the patch, against 2.6.21-rc3.  The source 
file include/asm-i386/hw_breakpoint.h includes an example showing how to 
set up a kernel breakpoint.

Alan Stern



Index: 2.6.21-rc2/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ 2.6.21-rc2/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,193 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ *
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: extent of the breakpoint address (1, 2, or 4 bytes)
+ * @type: breakpoint type (write-only, read/write, execute, or I/O)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execution breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address, @len, and @type fields are standard, indicating the
+ * location of the breakpoint, its extent in bytes, and the type of
+ * access that will trigger the breakpoint.  Possible values for @len are
+ * 1, 2, and 4.  (On x86_64, @len may also be 8.)  Possible values for
+ * @type are %HW_BREAKPOINT_WRITE (triggered on write access),
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_IO (triggered on I/O-space access), and
+ * %HW_BREAKPOINT_EXECUTE (triggered on instruction execution).  Certain
+ * restrictions apply: %HW_BREAKPOINT_EXECUTE requires that @len be 1,
+ * and %HW_BREAKPOINT_IO is available only on processors with Debugging
+ * Extensions.
+ *
+ * (Note: %HW_BREAKPOINT_IO is currently unimplemented.  It can be added
+ * if there is a demand for it.)
+ *
+ * In register_user_hw_breakpoint() and modify_user_hw_breakpoint(),
+ * @address must refer to a location in user space (unless @type is
+ * %HW_BREAKPOINT_IO).  The breakpoint will be active only while the
+ * requested task is running.  Conversely, in
+ * register_kernel_hw_breakpoint() @address must refer to a location in
+ * kernel space, and the breakpoint will be active on all CPUs regardless
+ * of the task being run.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  %HW_BREAKPOINT_EXECUTE traps occur before the
+ * breakpointed instruction executes; all other types of trap occur after
+ * the memory or I/O access has taken place.  All breakpoints are
+ * disabled while @triggered runs, to avoid recursive traps and allow
+ * unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed (provided the member entries are
+ * valid), but the breakpoint may not be installed in a debug register
+ * right away.  Physical debug registers are allocated based on the
+ * priority level stored in @priority (higher values indicate higher
+ * priority).  User-space breakpoints within a single thread compete with
+ * one another, and all user-space breakpoints compete with all
+ * kernel-space breakpoints; however user-space breakpoints in different
+ * threads do not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used
+ * for ptrace requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered might
+ * be called when you may not expect it, but this way the breakpoint
+ * owner knows that during the time interval from @installed to
+ * @uninstalled, all events are faithfully reported.  (It is not possible
+ * to do any better than this in general, because on SMP systems there is
+ * no way to set a debug register simultaneously on all CPUs.)  The same
+ * isn't always true with user-space breakpoints, but the differences
+ * should not be visible to a user process.
+ *
+ * The @address, @len, and @type fields in a use-space breakpoint can be
+ * changed by calling modify_user_hw_breakpoint().  Kernel-space
+ * breakpoints cannot be modified, nor can the @priority value in
+ * user-space breakpoints, after the breakpoint has been registered.  And
+ * of course all the fields in a %hw_breakpoint structure should be
+ * treated as read-only while the breakpoint is registered.
+ *
+ * @node and @status are intended for internal use, however @status may
+ * be read to determine whether or not the breakpoint is currently
+ * installed.
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address = &pid_max;
+ * 	my_bp.type = HW_BREAKPOINT_WRITE;
+ * 	my_bp.len = 4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * 	rc = register_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	void		*address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/* HW breakpoint types */
+#define HW_BREAKPOINT_EXECUTE	0x0	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x1	/* trigger on memory write */
+#define HW_BREAKPOINT_IO	0x2	/* trigger on I/O space access */
+#define HW_BREAKPOINT_RW	0x3	/* trigger on memory read or write */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following three routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, void *address, u8 len, u8 type);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: 2.6.21-rc2/arch/i386/kernel/process.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/process.c
+++ 2.6.21-rc2/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/pda.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,17 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -430,14 +434,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -467,7 +478,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -479,18 +491,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -540,16 +552,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -682,7 +684,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -714,6 +716,13 @@ struct task_struct fastcall * __switch_t
 
 	write_pda(pcurrent, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: 2.6.21-rc2/arch/i386/kernel/signal.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/signal.c
+++ 2.6.21-rc2/arch/i386/kernel/signal.c
@@ -592,13 +592,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: 2.6.21-rc2/arch/i386/kernel/traps.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/traps.c
+++ 2.6.21-rc2/arch/i386/kernel/traps.c
@@ -806,62 +806,49 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	struct die_args args;
 
-	get_debugreg(condition, 6);
-
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	args.regs = regs;
+	args.str = "debug";
+	get_debugreg(args.err, 6);
+	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
+	args.trapnr = error_code;
+	args.signr = SIGTRAP;
+	args.ret = 0;
+	if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) ==
+			NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((args.err & DR_STEP) && !user_mode(regs)) {
+		args.err &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~TF_MASK;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 |= args.err;
 
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if ((args.err & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) ||
+			args.ret)
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: 2.6.21-rc2/include/asm-i386/debugreg.h
===================================================================
--- 2.6.21-rc2.orig/include/asm-i386/debugreg.h
+++ 2.6.21-rc2/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -58,7 +60,46 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
-#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
+#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
+#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */
+
+
+/*
+ * HW breakpoint additions
+ */
+
+#include <asm/hw_breakpoint.h>
+#include <linux/spinlock.h>
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+struct thread_hw_breakpoint {	/* HW breakpoint info for a thread */
+	spinlock_t		lock;		/* Protect thread_bps, tdr7 */
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+
+	/* ptrace support -- note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			* representing virtualized debug registers 0 - 3 */
+};
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
 
 #endif
Index: 2.6.21-rc2/include/asm-i386/processor.h
===================================================================
--- 2.6.21-rc2.orig/include/asm-i386/processor.h
+++ 2.6.21-rc2/include/asm-i386/processor.h
@@ -402,8 +402,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: 2.6.21-rc2/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ 2.6.21-rc2/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,1149 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	Permissions for I/O user bp?
+
+	Set RF flag bit for execution faults?
+
+	TF flag bit for single-step exceptions in kernel space?
+
+	CPU hotplug, kexec, etc?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm-generic/percpu.h>
+
+#include <asm/debugreg.h>
+#include <asm/kdebug.h>
+#include <asm/processor.h>
+
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+	int			num_kbps;	/* Number of kernel bps */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Kernel-space breakpoint data */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static unsigned long		kdr7;		/* Kernel DR7 value */
+static int			num_kbps;	/* Number of kernel bps */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0203,	/* LEN0, R/W0, GE, G0, L0 */
+	0x00ff020f,	/* Same for 0,1 */
+	0x0fff023f,	/* Same for 0,1,2 */
+	0xffff02ff	/* Same for 0,1,2,3 */
+};
+
+
+/*
+ * Install a single breakpoint in its debug register.
+ */
+static void install_breakpoint(struct cpu_hw_breakpoint *chbi, int i,
+		struct hw_breakpoint *bp)
+{
+	unsigned long temp;
+
+	chbi->bps[i] = bp;
+	temp = (unsigned long) bp->address;
+	switch (i) {
+		case 0:	set_debugreg(temp, 0);	break;
+		case 1:	set_debugreg(temp, 1);	break;
+		case 2:	set_debugreg(temp, 2);	break;
+		case 3:	set_debugreg(temp, 3);	break;
+	}
+}
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	unsigned long dr7;
+	struct cpu_hw_breakpoint *chbi;
+	int i = HB_NUM;
+	unsigned long flags;
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this same time, so we can't use the global
+	 * value stored in num_kbps.  Instead we'll use the per-cpu
+	 * value stored in cpu_info. */
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Keep the DR7 bits that refer to kernel breakpoints */
+	get_debugreg(dr7, 7);
+	dr7 &= kdr7_masks[chbi->num_kbps];
+
+	/* Kernel breakpoints are stored starting in DR0 and going up,
+	 * and there are num_kbps of them.  Thread breakpoints are stored
+	 * starting in DR3 and going down, as many as we have room for. */
+	if (tsk && test_tsk_thread_flag(tsk, TIF_DEBUG)) {
+		struct thread_hw_breakpoint *thbi =
+				tsk->thread.hw_breakpoint_info;
+		struct hw_breakpoint *bp;
+
+		set_debugreg(dr7, 7);	/* Disable user bps while switching */
+
+		/* Store this thread's breakpoint addresses and update
+		 * the statuses. */
+//		spin_lock(&thbi->lock);
+		list_for_each_entry(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < chbi->num_kbps) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+				install_breakpoint(chbi, i, bp);
+			}
+		}
+
+		/* Mask in the parts of DR7 that refer to the new thread */
+		dr7 |= (~kdr7_masks[chbi->num_kbps] & thbi->tdr7);
+//		spin_unlock(&thbi->lock);
+	}
+
+	/* Clear any remaining stale bp pointers */
+	while (--i >= chbi->num_kbps)
+		chbi->bps[i] = NULL;
+	set_debugreg(dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void switch_kernel_hw_breakpoint(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	unsigned long dr7;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	i = 0;
+	list_for_each_entry(bp, &kernel_bps, node) {
+		if (i >= chbi->num_kbps)
+			break;
+		install_breakpoint(chbi, i, bp);
+		++i;
+	}
+
+	dr7 = kdr7 & kdr7_masks[chbi->num_kbps];
+	set_debugreg(dr7, 7);
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->num_kbps = num_kbps;
+
+	/* Install both the kernel and the user breakpoints */
+	switch_kernel_hw_breakpoint(chbi);
+	switch_to_thread_hw_breakpoint(chbi->bp_task);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	update_this_cpu(NULL);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+	struct hw_breakpoint *bp;
+
+	i = 0;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		tprio[i] = max(tprio[i], bp->priority);
+		if (++i >= HB_NUM)
+			break;
+	}
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio. */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int i;
+	struct hw_breakpoint *bp;
+	int new_num_kbps;
+	int changed = 0;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps. */
+	new_num_kbps = i = 0;
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (i + new_num_kbps < HB_NUM) {
+		if (&bp->node == &kernel_bps || tprio[i] >= bp->priority)
+			++i;		/* User bps win a slot */
+		else {
+			++new_num_kbps;	/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (new_num_kbps != num_kbps) {
+		changed = 1;
+		num_kbps = new_num_kbps;
+	}
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled. */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		i = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (i++ >= num_kbps)
+				break;
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+//			spin_lock_init(&thbi->lock);
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary, and don't keep
+	 * a pointer to a task which may be about to exit. */
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(NULL);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE? */
+
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = (unsigned long)
+					thbi->vdr_bps[i].address;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+
+	switch (bp->len) {
+	case 1:  case 2:  case 4:	/* 8 is also valid for x86_64 */
+		break;
+	default:
+		return rc;
+	}
+
+	switch (bp->type) {
+	case HW_BREAKPOINT_WRITE:
+	case HW_BREAKPOINT_RW:
+		break;
+	case HW_BREAKPOINT_IO:
+		if (!cpu_has_de)
+			return rc;
+		break;
+	case HW_BREAKPOINT_EXECUTE:
+		if (bp->len == 1)
+			break;
+		/* FALL THROUGH */
+	default:
+		return rc;
+	}
+
+	/* Check that the address is in the proper range.  Note that tsk
+	 * is NULL for kernel bps and non-NULL for user bps. */
+	if (bp->type == HW_BREAKPOINT_IO) {
+#if 0
+		/* Check whether the task has permission to access the
+		 * I/O port at bp->address. */
+		if (tsk) {
+			/* WRITEME */
+			return -EPERM;
+		}
+#else
+		return -ENOSYS;	/* Not implemented, requires setting CR4.DE */
+#endif
+	} else {
+
+		/* Check whether bp->address points to user space.
+		 * With x86_64, use TASK_SIZE_OF(tsk) instead of TASK_SIZE. */
+		if ((tsk != NULL) != ((unsigned long) bp->address < TASK_SIZE))
+			return rc;
+	}
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type, int local)
+{
+	unsigned long temp;
+
+	/* For x86_64:
+	 *
+	 * if (len == 8)
+	 *	len = 3;
+	 */
+	temp = ((len - 1) << 2) | type;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	if (local)
+		temp |= (DR_LOCAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_LOCAL_EXACT;
+	else
+		temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_EXACT;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for the list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct list_head *bp_list, int is_user)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later. */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		drnum = (is_user ? HB_NUM - 1 - i : i);
+		dr7 |= encode_dr7(drnum, bp->len, bp->type, is_user);
+		if (++i >= HB_NUM)
+			break;
+	}
+	return dr7;
+}
+
+/*
+ * Update the DR7 value for a user thread.
+ */
+static void update_user_dr7(struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ *
+ * The caller must hold thbi->lock.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk) {
+		head = &thbi->thread_bps;
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(head)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	} else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	list_add_tail(&bp->node, &temp_bp->node);
+
+	bp->status = HW_BREAKPOINT_REGISTERED;
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ *
+ * The caller must hold thbi->lock.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp, struct thread_hw_breakpoint *thbi,
+		struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the thread_hw_breakpoint
+	 * structure, so that the virtualized debug register values will
+	 * remain valid. */
+	list_del(&bp->node);
+	if (tsk) {
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+//	unsigned long flags;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+//	spin_lock_irqsave(&thbi->lock, flags);
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+//	spin_unlock_irqrestore(&thbi->lock, flags);
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase. */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < HB_NUM - num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+//	unsigned long flags;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+//	spin_lock_irqsave(&thbi->lock, flags);
+	remove_bp_from_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+//	spin_unlock_irqrestore(&thbi->lock, flags);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary. */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Actual implementation of modify_user_hw_breakpoint.
+ */
+static int __modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, void *address, u8 len, u8 type)
+{
+	unsigned long flags;
+
+	if (!bp->status) {	/* Not registered, just store the values */
+		bp->address = address;
+		bp->len = len;
+		bp->type = type;
+		return 0;
+	}
+
+	/* Check the new values */
+	{
+		struct hw_breakpoint temp_bp = *bp;
+		int rc;
+
+		temp_bp.address = address;
+		temp_bp.len = len;
+		temp_bp.type = type;
+		rc = validate_settings(&temp_bp, tsk);
+		if (rc)
+			return rc;
+	}
+
+	/* Okay, update the breakpoint.  An interrupt at this point might
+	 * cause I/O to a breakpointed port, so disable interrupts. */
+	local_irq_save(flags);
+	bp->address = address;
+	bp->len = len;
+	bp->type = type;
+	update_user_dr7(tsk->thread.hw_breakpoint_info);
+
+	/* The priority hasn't changed so we don't need to rebalance
+	 * anything.  Just install the new breakpoint, if necessary. */
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+	local_irq_restore(flags);
+	return 0;
+}
+
+/**
+ * modify_user_hw_breakpoint - modify a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to modify
+ * @address: the new value for @bp->address
+ * @len: the new value for @bp->len
+ * @type: the new value for @bp->type
+ *
+ * @bp need not currently be registered.  If it isn't, the new values
+ * are simply stored in it and @tsk is ignored.  Otherwise the new values
+ * are validated first and then stored.  If @tsk is the current process
+ * and @bp is installed in a debug register, the register is updated.
+ *
+ * Returns 0 if the new values are acceptable, otherwise a negative error
+ * number.
+ */
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, void *address, u8 len, u8 type)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __modify_user_hw_breakpoint(tsk, bp, address, len, type);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Update the DR7 value for the kernel.
+ */
+static void update_kernel_dr7(void)
+{
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that breakpoint i
+	 * was hit, so the thread's debugger will see it. */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address;
+	} else if (n == 6)
+		val = tsk->thread.vdr6;
+	else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+	int tlen = 1 + ((temp >> 2) & 0x3);
+
+	/* For x86_64:
+	 *
+	 * if (tlen == 3)
+	 *	tlen = 8;
+	 */
+	*len = tlen;
+	*type = temp & 0x3;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints,
+	 * making the appropriate changes to each. */
+restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint if it should now be disabled.
+		 * Do this first so that setting invalid values for len
+		 * or type won't cause an error. */
+		if (!enabled && bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's settings.  If the bp is enabled,
+		 * an invalid entry will cause an error. */
+		if (__modify_user_hw_breakpoint(tsk, bp,
+				bp->address, len, type) < 0 && rc == 0)
+			break;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will cause an error here. */
+		if (enabled && !bp->status) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		if (__modify_user_hw_breakpoint(tsk, bp, (void *) val,
+				bp->len, bp->type) >= 0)
+			rc = 0;
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_handler(struct die_args *data)
+{
+	unsigned long dr6, dr7;
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+
+	dr6 = data->err;
+	if ((dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) == 0)
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions. */
+	get_debugreg(dr7, 7);
+	set_debugreg(0, 7);
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (chbi->bp_task != current && chbi->bp_task != NULL) {
+
+		/* No user breakpoints are valid.  Clear those bits in
+		 * dr6 and perform the belated debug-register switch. */
+		chbi->bp_task = NULL;
+		dr7 &= kdr7_masks[chbi->num_kbps];
+		for (i = chbi->num_kbps; i < HB_NUM; ++i) {
+			dr6 &= ~(DR_TRAP0 << i);
+			chbi->bps[i] = NULL;
+		}
+	}
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (!(dr6 & (DR_TRAP0 << i)))
+			continue;
+
+		/* If this was a user breakpoint, tell do_debug() to send
+		 * a SIGTRAP. */
+		if (i >= chbi->num_kbps)
+			data->ret = 1;
+
+		/* Invoke the triggered callback */
+		bp = chbi->bps[i];
+		if (bp)		/* Should always be non-NULL */
+			(bp->triggered)(bp, data->regs);
+	}
+	put_cpu_no_resched();
+
+	/* Re-enable the breakpoints */
+	set_debugreg(dr7, 7);
+
+	/* Mask out the bits we have handled from the DR6 value */
+	data->err &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Early exit from the notifier chain if everything has been handled */
+	if (data->err == 0)
+		return NOTIFY_STOP;
+	return NOTIFY_DONE;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
Index: 2.6.21-rc2/arch/i386/kernel/ptrace.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/ptrace.c
+++ 2.6.21-rc2/arch/i386/kernel/ptrace.c
@@ -383,11 +383,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -417,59 +417,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -625,7 +577,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: 2.6.21-rc2/arch/i386/kernel/Makefile
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/Makefile
+++ 2.6.21-rc2/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: 2.6.21-rc2/arch/i386/power/cpu.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/power/cpu.c
+++ 2.6.21-rc2/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -45,6 +46,11 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	/*
+	 * disable the debug registers
+	 */
+	set_debugreg(0, 7);
 }
 
 void save_processor_state(void)
@@ -69,20 +75,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: 2.6.21-rc2/arch/i386/kernel/kprobes.c
===================================================================
--- 2.6.21-rc2.orig/arch/i386/kernel/kprobes.c
+++ 2.6.21-rc2/arch/i386/kernel/kprobes.c
@@ -670,8 +670,11 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+			args->err &= ~DR_STEP;
+			if (args->err == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
Index: 2.6.21-rc2/include/asm-i386/kdebug.h
===================================================================
--- 2.6.21-rc2.orig/include/asm-i386/kdebug.h
+++ 2.6.21-rc2/include/asm-i386/kdebug.h
@@ -15,6 +15,7 @@ struct die_args {
 	long err;
 	int trapnr;
 	int signr;
+	int ret;
 };
 
 extern int register_die_notifier(struct notifier_block *);


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-09 18:40                                 ` Alan Stern
@ 2007-03-13  8:00                                   ` Roland McGrath
  2007-03-13 13:07                                     ` Alan Cox
  2007-03-13 18:56                                     ` Alan Stern
  0 siblings, 2 replies; 70+ messages in thread
From: Roland McGrath @ 2007-03-13  8:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> Well, I can add in the test for 0, but finding the set of always-on bits
> in DR6 will have to be done separately.  Isn't it possible that different
> CPUs could have different bits?

I don't know, but it seems unlikely.  AFAIK all CPUs are presumed to have
the same CPUID results, for example.

> At that moment D, running on CPU 1, decides to unregister a breakpoint in
> T.  Clearing TIF_DEBUG now doesn't do any good -- it's too late; CPU 0 has
> already tested it.  CPU 1 goes in and alters the user breakpoint data,
> maybe even deallocating a breakpoint structure that CPU 0 is about to
> read.  Not a good situation.

You make it sound like a really good case for RCU. ;-)

> No way to tell when a task being debugged is started up by anything other 
> than its debugger?  Hmmm, in that case maybe it would be better to use 
> RCU.  It won't add much overhead to anything but the code for registering 
> and unregistering user breakpoints.

Indeed, it is for this sort of thing.  Still, it feels like a bit too much
is going on in switch_to_thread_hw_breakpoint for the common case.
It seems to me it ought to be something as simple as this:

	if (unlikely((thbi->want_dr7 &~ chbi->kdr7) != thbi->active_tdr7) {
		/* Need to make some installed or uninstalled callbacks.  */
		if (thbi->active_tdr7 & chbi->kdr7)
			uninstalled callbacks;
		else
			installed callbacks;
		recompute active_dr7, want_dr7;
	}

	switch (thbi->active_bkpts) {
	case 4:
		set_debugreg(0, thbi->tdr[0]);
	case 3:
		set_debugreg(1, thbi->tdr[1]);
	case 2:
		set_debugreg(2, thbi->tdr[2]);
	case 1:
		set_debugreg(3, thbi->tdr[3]);
	}
	set_debugreg(7, chbi->kdr7 | thbi->active_tdr7);

Only in the unlikely case do you need to worry about synchronization,
whether it's RCU or spin locks or whatever.  The idea is that breakpoint
installation when doing the fancy stuff would reduce it to "the four
breakpoints I would like, in order" (tdr[3] containing the highest-priority
one), and dr7 masks describing what dr7 you were using last time you were
running (active_tdr7), and that plus the enable bits you would like to have
set (want_dr7).  The unlikely case is when the number of kernel debugregs
consumed changed since the last time you switched in, so you go recompute
active_tdr7.  (Put the body of that if in another function.)

For the masks to work as I described, you need to use the same enable bit
(or both) for kernel and user allocations.  It really doesn't matter which
one you use, since all of Linux is "local" for the sense of the dr7 enable
bits (i.e. you should just use DR_GLOBAL_ENABLE).

It's perfectly safe to access all this stuff while it might be getting
overwritten, and worst case you switch in some user breakpoints you didn't
want.  That only happens in the SIGKILL case, when you never hit user mode
again and don't care.

> +void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
[...]
> +	/* Keep the DR7 bits that refer to kernel breakpoints */
> +	get_debugreg(dr7, 7);
> +	dr7 &= kdr7_masks[chbi->num_kbps];

I don't understand what this part is for.  Why fetch dr7 from the CPU?  You
already know what's there.  All you need is the current dr7 bits belonging
to kernel allocations, i.e. a chbi->kdr7 mask.

> +	if (tsk && test_tsk_thread_flag(tsk, TIF_DEBUG)) {

switch_to_thread_hw_breakpoint is on the context switch path.  On that
path, this test can never be false.  The context switch path should not
have any unnecessary conditionals.  If you want to share code with some
other places that now call switch_to_thread_hw_breakpoint, they can share a
common inline for part of the guts.

> +		set_debugreg(dr7, 7);	/* Disable user bps while switching */

What is this for?  The kernel's dr7 bits are already set.  Why does it
matter if bits enabling user breakpoints are set too?  No user breakpoint
can be hit on this CPU before this function returns.

> +	/* Clear any remaining stale bp pointers */
> +	while (--i >= chbi->num_kbps)
> +		chbi->bps[i] = NULL;

Why is this done here?  This can be done when the kernel allocations are
installed/uninstalled.

> @@ -15,6 +15,7 @@ struct die_args {
>  	long err;
>  	int trapnr;
>  	int signr;
> +	int ret;
>  };

I don't understand why you added this at all.

>  fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
[...]
> +	if ((args.err & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) ||
> +			args.ret)
> +		send_sigtrap(tsk, regs, error_code);

The args.err test is fine.  A notifier that wants the SIGTRAP sent should
just leave the appropriate DR_* bit set rather than clear it.

In hw_breakpoint_handler, you could just change:

		if (i >= chbi->num_kbps)
			data->ret = 1;
to:

		if (i < chbi->num_kbps)
			data->err &= ~(DR_TRAP0 << i);

But really I think you might as well just have the triggered callback call
send_sigtrap itself.  That's fine for ptrace_triggered.  When I get to a
utrace-based layer on this, it can either send the signal itself in the
same way, or do something else better if I have another option by then.

After all that fun x86 implementation detail, now I have some comments
about the interface.  I took a gander at what implementing hw_breakpoint on
powerpc would be like, and it looks pretty simple.  It gave me some more
detailed thoughts about making the source API more uniformly convenient
across machines.

Firstly, everything but a few #define's should be in a shared file.  First
I was thinking linux/hw_breakpoint.h, but now I think it should be
asm-generic/hw_breakpoint.h and asm-{i386,x86_64,...}/hw_breakpoint.h do:

	#define HW_BREAKPOINT_...
	#include <asm-generic/hw_breakpoint.h>

That way, asm/hw_breakpoint.h is what modules #include, and that file is
just absent on machines without the support (as opposed to a
linux/hw_breakpoint.h that's there but not always useful).

> +struct hw_breakpoint {
[...]
> +	void		*address;

You might actually want to write this:

	union {
		void *kernel;
		void __user *user;
		unsigned long va;
	} address;

Setting the address uses the appropriate pointer.  Turning it into a debug
register value uses va.  This helps maintain discipline of using __user, so
that kernel analysis tools can reliably cite use of user or kernel
addresses as right or wrong.

> +	u8		len;
> +	u8		type;

I don't think we actually want to expose these as fields in this way at all.
Instead, just a single field of machine-format bits, and then "encoding"
for dr7 values is just:

	dr7 |= bp->bits << hwnum;

This field is not set by the user directly, but by the registration call.
It takes type and len arguments, validates and combines them.  There is no
need for encoding really, just validation and use the hardware bits in the
asm/hw_breakpoint.h constants with standard names:

#define HW_BREAKPOINT_LEN1	DR_LEN_1
...

On powerpc, the address breakpoint is always for an 8-byte address range.
Also, it has distinct bits for breaking on loads and breaking on stores,
but no hardware instruction breakpoint is supported.
So it would define:

#define HW_BREAKPOINT_LEN8		0xc001d00d
#define HW_BREAKPOINT_TYPE_READ		1
#define HW_BREAKPOINT_TYPE_WRITE	2
#define HW_BREAKPOINT_TYPE_RW		3

Using two args in the registration calls lets it do validation simply with:

	if (len != HW_BREAKPOINT_LEN8 || (type &~ 3))
		return -EINVAL;
	bp->bits = type;

while still verifying that an explicit length is used by the caller.
The validation is also simple on x86, and then:

	bp->bits = ((len | type) << DR_CONTROL_SHIFT) | 2;

This source interface lets a module use #ifdef HW_BREAKPOINT_* to figure
out what's available without needing any specific machine knowledge.
Also, HW_BREAKPOINT_TYPE_EXECUTE should have HW_BREAKPOINT_LEN_EXECUTE
that is the required argument for that type, so callers don't have to
encode the machine knowedlge of using LEN1 for execute.

The definition of "breakpoint length" on all the machines (whether a single
length is available or more) seems to be that the breakpoint address has to
be aligned to the length and what it catches is any access to any byte
within that range.  i.e., the length means a mask of low bits cleared from
an address before it's compared the the breakpoint address.  (On powerpc,
the low three bits of the register are used as flags, so only the high bits
of the address are even stored.)

The hw_breakpoint documentation should make this definition more explicit,
and it probably ought to enforce the alignment of the address specified.

> +#define HW_BREAKPOINT_IO	0x2	/* trigger on I/O space access */

Let's not define a name for this while we are not really supporting it.

> +/* HW breakpoint status values */
> +#define HW_BREAKPOINT_REGISTERED	1
> +#define HW_BREAKPOINT_INSTALLED		2

This doesn't really need to be public outside of hw_breakpoint.c, does it?

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-13  8:00                                   ` Roland McGrath
@ 2007-03-13 13:07                                     ` Alan Cox
  2007-03-13 18:56                                     ` Alan Stern
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Cox @ 2007-03-13 13:07 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Alan Stern, Prasanna S Panchamukhi, Kernel development list

On Tue, 13 Mar 2007 01:00:50 -0700 (PDT)
Roland McGrath <roland@redhat.com> wrote:

> > Well, I can add in the test for 0, but finding the set of always-on bits
> > in DR6 will have to be done separately.  Isn't it possible that different
> > CPUs could have different bits?
> 
> I don't know, but it seems unlikely.  AFAIK all CPUs are presumed to have
> the same CPUID results, for example.

No. We merge the CPUID information to get a shared set of capability bits.

Generic PC systems with a mix of PII and PIII are possible. The voyager
architecture can have even more peculiar combinations of processor
modules installed.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-13  8:00                                   ` Roland McGrath
  2007-03-13 13:07                                     ` Alan Cox
@ 2007-03-13 18:56                                     ` Alan Stern
  2007-03-14  3:00                                       ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-13 18:56 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 13 Mar 2007, Roland McGrath wrote:

> > At that moment D, running on CPU 1, decides to unregister a breakpoint in
> > T.  Clearing TIF_DEBUG now doesn't do any good -- it's too late; CPU 0 has
> > already tested it.  CPU 1 goes in and alters the user breakpoint data,
> > maybe even deallocating a breakpoint structure that CPU 0 is about to
> > read.  Not a good situation.
> 
> You make it sound like a really good case for RCU. ;-)
> 
> > No way to tell when a task being debugged is started up by anything other 
> > than its debugger?  Hmmm, in that case maybe it would be better to use 
> > RCU.  It won't add much overhead to anything but the code for registering 
> > and unregistering user breakpoints.
> 
> Indeed, it is for this sort of thing.

Well, if I have to use it then I will.

>  Still, it feels like a bit too much
> is going on in switch_to_thread_hw_breakpoint for the common case.
> It seems to me it ought to be something as simple as this:
> 
> 	if (unlikely((thbi->want_dr7 &~ chbi->kdr7) != thbi->active_tdr7) {
> 		/* Need to make some installed or uninstalled callbacks.  */
> 		if (thbi->active_tdr7 & chbi->kdr7)
> 			uninstalled callbacks;
> 		else
> 			installed callbacks;
> 		recompute active_dr7, want_dr7;
> 	}
> 
> 	switch (thbi->active_bkpts) {
> 	case 4:
> 		set_debugreg(0, thbi->tdr[0]);
> 	case 3:
> 		set_debugreg(1, thbi->tdr[1]);
> 	case 2:
> 		set_debugreg(2, thbi->tdr[2]);
> 	case 1:
> 		set_debugreg(3, thbi->tdr[3]);
> 	}
> 	set_debugreg(7, chbi->kdr7 | thbi->active_tdr7);

Yes, the code could be reworked by moving some of the data from the CPU
hw-breakpoint info into the thread's info.  I'll see how much simpler it
ends up being.

> Only in the unlikely case do you need to worry about synchronization,
> whether it's RCU or spin locks or whatever.  The idea is that breakpoint
> installation when doing the fancy stuff would reduce it to "the four
> breakpoints I would like, in order" (tdr[3] containing the highest-priority
> one), and dr7 masks describing what dr7 you were using last time you were
> running (active_tdr7), and that plus the enable bits you would like to have
> set (want_dr7).  The unlikely case is when the number of kernel debugregs
> consumed changed since the last time you switched in, so you go recompute
> active_tdr7.  (Put the body of that if in another function.)

It isn't quite that easy.  Even though the number of user breakpoints may
not have changed, their identities may have.  So the unlikely case has to
encompass two possibilities: the number of installable user breakpoints
has changed, or any user breakpoints have been registered or unregistered.

> For the masks to work as I described, you need to use the same enable bit
> (or both) for kernel and user allocations.  It really doesn't matter which
> one you use, since all of Linux is "local" for the sense of the dr7 enable
> bits (i.e. you should just use DR_GLOBAL_ENABLE).

This shouldn't be necessary.  So long as DR_GLOBAL_ENABLE always belongs
to the kernel's part of DR7 and DR_LOCAL_ENABLE always belongs to the
thread's part there will be no interference between them.

> It's perfectly safe to access all this stuff while it might be getting
> overwritten, and worst case you switch in some user breakpoints you didn't
> want.  That only happens in the SIGKILL case, when you never hit user mode
> again and don't care.

Maybe.  I always had in the back of my mind the possibility that there
might be a user I/O breakpoint set.  It could be triggered by an interrupt
handler even in the SIGKILL case.  But since we're not supporting I/O
breakpoints now, that's a moot point.

> > +void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
> [...]
> > +	/* Keep the DR7 bits that refer to kernel breakpoints */
> > +	get_debugreg(dr7, 7);
> > +	dr7 &= kdr7_masks[chbi->num_kbps];
> 
> I don't understand what this part is for.  Why fetch dr7 from the CPU?  You
> already know what's there.  All you need is the current dr7 bits belonging
> to kernel allocations, i.e. a chbi->kdr7 mask.

Actually the code _doesn't_ already know what's there; the chbi area
doesn't include any storage for the kernel DR7 value.  I figured it was at
least as easy to read it from the CPU register as to read it from memory.  
But maybe that's not true; according to my ancient processor manual, moves
to/from debug registers take many more clock cycles than moves to/from
memory.

> > +	if (tsk && test_tsk_thread_flag(tsk, TIF_DEBUG)) {
> 
> switch_to_thread_hw_breakpoint is on the context switch path.  On that
> path, this test can never be false.  The context switch path should not
> have any unnecessary conditionals.  If you want to share code with some
> other places that now call switch_to_thread_hw_breakpoint, they can share a
> common inline for part of the guts.

Okay.  That test was for situations when it's necessary to install only 
the kernel's debug registers, with no thread registers.  It can easily 
be made into a separate routine.

> > +		set_debugreg(dr7, 7);	/* Disable user bps while switching */
> 
> What is this for?  The kernel's dr7 bits are already set.  Why does it
> matter if bits enabling user breakpoints are set too?  No user breakpoint
> can be hit on this CPU before this function returns.

I was being overly cautious.  If interrupts were enabled then it might be 
possible to trip a user I/O breakpoint.  But since they aren't, that code 
isn't needed.

> > +	/* Clear any remaining stale bp pointers */
> > +	while (--i >= chbi->num_kbps)
> > +		chbi->bps[i] = NULL;
> 
> Why is this done here?  This can be done when the kernel allocations are
> installed/uninstalled.

No.  If a debugger has removed some user breakpoints since the last time
the thread ran, the chbi->bps[] entries could still be present.  Likewise
if the previously-running task had more breakpoints than the current one.

Of course it doesn't actually hurt to have stale pointers lying around;
they refer to breakpoints which aren't enabled.  So clearing them isn't
really necessary.

> > @@ -15,6 +15,7 @@ struct die_args {
> >  	long err;
> >  	int trapnr;
> >  	int signr;
> > +	int ret;
> >  };
> 
> I don't understand why you added this at all.
> 
> >  fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
> [...]
> > +	if ((args.err & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) ||
> > +			args.ret)
> > +		send_sigtrap(tsk, regs, error_code);
> 
> The args.err test is fine.  A notifier that wants the SIGTRAP sent should
> just leave the appropriate DR_* bit set rather than clear it.
> 
> In hw_breakpoint_handler, you could just change:
> 
> 		if (i >= chbi->num_kbps)
> 			data->ret = 1;
> to:
> 
> 		if (i < chbi->num_kbps)
> 			data->err &= ~(DR_TRAP0 << i);

This proposed change is wrong.  Remember that ptrace breakpoints are 
virtualized; they are assigned to different debug registers from what the 
debugger thinks.  That's why I added args.ret.

> But really I think you might as well just have the triggered callback call
> send_sigtrap itself.  That's fine for ptrace_triggered.  When I get to a
> utrace-based layer on this, it can either send the signal itself in the
> same way, or do something else better if I have another option by then.

This sounds like a better solution.


> After all that fun x86 implementation detail, now I have some comments
> about the interface.  I took a gander at what implementing hw_breakpoint on
> powerpc would be like, and it looks pretty simple.  It gave me some more
> detailed thoughts about making the source API more uniformly convenient
> across machines.
> 
> Firstly, everything but a few #define's should be in a shared file.  First
> I was thinking linux/hw_breakpoint.h, but now I think it should be
> asm-generic/hw_breakpoint.h and asm-{i386,x86_64,...}/hw_breakpoint.h do:
> 
> 	#define HW_BREAKPOINT_...
> 	#include <asm-generic/hw_breakpoint.h>
> 
> That way, asm/hw_breakpoint.h is what modules #include, and that file is
> just absent on machines without the support (as opposed to a
> linux/hw_breakpoint.h that's there but not always useful).
> 
> > +struct hw_breakpoint {
> [...]
> > +	void		*address;
> 
> You might actually want to write this:
> 
> 	union {
> 		void *kernel;
> 		void __user *user;
> 		unsigned long va;
> 	} address;
> 
> Setting the address uses the appropriate pointer.  Turning it into a debug
> register value uses va.  This helps maintain discipline of using __user, so
> that kernel analysis tools can reliably cite use of user or kernel
> addresses as right or wrong.
> 
> > +	u8		len;
> > +	u8		type;
> 
> I don't think we actually want to expose these as fields in this way at all.
> Instead, just a single field of machine-format bits, and then "encoding"
> for dr7 values is just:
> 
> 	dr7 |= bp->bits << hwnum;
> 
> This field is not set by the user directly, but by the registration call.
> It takes type and len arguments, validates and combines them.  There is no
> need for encoding really, just validation and use the hardware bits in the
> asm/hw_breakpoint.h constants with standard names:
> 
> #define HW_BREAKPOINT_LEN1	DR_LEN_1
> ...

I don't like using DR_LEN_1, because it would force asm/debugreg.h to be 
#included by any user of hw_breakpoint.  The raw numerical value should do 
just as well.

> On powerpc, the address breakpoint is always for an 8-byte address range.

So there's no way to trap on accesses to a particular byte within a
string?

> Also, it has distinct bits for breaking on loads and breaking on stores,
> but no hardware instruction breakpoint is supported.
> So it would define:
> 
> #define HW_BREAKPOINT_LEN8		0xc001d00d
> #define HW_BREAKPOINT_TYPE_READ		1
> #define HW_BREAKPOINT_TYPE_WRITE	2
> #define HW_BREAKPOINT_TYPE_RW		3
> 
> Using two args in the registration calls lets it do validation simply with:
> 
> 	if (len != HW_BREAKPOINT_LEN8 || (type &~ 3))
> 		return -EINVAL;
> 	bp->bits = type;
> 
> while still verifying that an explicit length is used by the caller.
> The validation is also simple on x86, and then:
> 
> 	bp->bits = ((len | type) << DR_CONTROL_SHIFT) | 2;

Not quite that simple, but close to it.  (len | type) ends up being
shifted by 4*drnum whereas the enable bit gets shifted by 2*drnum (where
drnum is the debug register assigned to the breakpoint), so they can't be
stored together.  But the enable bit doesn't need to be present in
bp->bits; it is implied.

> This source interface lets a module use #ifdef HW_BREAKPOINT_* to figure
> out what's available without needing any specific machine knowledge.
> Also, HW_BREAKPOINT_TYPE_EXECUTE should have HW_BREAKPOINT_LEN_EXECUTE
> that is the required argument for that type, so callers don't have to
> encode the machine knowedlge of using LEN1 for execute.

Better yet, if type is HW_BREAKPOINT_TYPE_EXECUTE then just ignore the
caller's len and always use the correct value.

> The definition of "breakpoint length" on all the machines (whether a single
> length is available or more) seems to be that the breakpoint address has to
> be aligned to the length and what it catches is any access to any byte
> within that range.  i.e., the length means a mask of low bits cleared from
> an address before it's compared the the breakpoint address.  (On powerpc,
> the low three bits of the register are used as flags, so only the high bits
> of the address are even stored.)
> 
> The hw_breakpoint documentation should make this definition more explicit,
> and it probably ought to enforce the alignment of the address specified.

Since PPC doesn't allow lengths shorter than 8, perhaps on that 
architecture the bottom bits of the address should silently be cleared.

> > +#define HW_BREAKPOINT_IO	0x2	/* trigger on I/O space access */
> 
> Let's not define a name for this while we are not really supporting it.
> 
> > +/* HW breakpoint status values */
> > +#define HW_BREAKPOINT_REGISTERED	1
> > +#define HW_BREAKPOINT_INSTALLED		2
> 
> This doesn't really need to be public outside of hw_breakpoint.c, does it?

It gives drivers a way to tell whether or not the breakpoint is currently 
installed without having to do explicit tracking of installed() and 
uninstalled() callbacks.

These changes to the API sound pretty good.  Stay tuned for the next
version...

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-13 18:56                                     ` Alan Stern
@ 2007-03-14  3:00                                       ` Roland McGrath
  2007-03-14 19:11                                         ` Alan Stern
                                                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Roland McGrath @ 2007-03-14  3:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> Yes, the code could be reworked by moving some of the data from the CPU
> hw-breakpoint info into the thread's info.  I'll see how much simpler it
> ends up being.

I don't quite understand that characterization of the kind of change I'm
advocating.  If the common case path in context switch has really anything
at all more than the example I gave, something is wrong.

> It isn't quite that easy.  Even though the number of user breakpoints may
> not have changed, their identities may have.  So the unlikely case has to
> encompass two possibilities: the number of installable user breakpoints
> has changed, or any user breakpoints have been registered or unregistered.

Why does it matter?  When a new user breakpoint was made the
highest-priority one, it ought to update tdr[0..3] right then before the
registration call returns.  It seems fine to me for it to make an
uninstalled callback right away rather than at the thread's next switch-in.
But even if you wanted to delay it, you could just set active_dr7 to zero
or something so that the unlikely case triggers.

> > For the masks to work as I described, you need to use the same enable bit
> > (or both) for kernel and user allocations.  It really doesn't matter which
> > one you use, since all of Linux is "local" for the sense of the dr7 enable
> > bits (i.e. you should just use DR_GLOBAL_ENABLE).
> 
> This shouldn't be necessary.  So long as DR_GLOBAL_ENABLE always belongs
> to the kernel's part of DR7 and DR_LOCAL_ENABLE always belongs to the
> thread's part there will be no interference between them.

The plan I suggested relies on setting want_dr7 with the enable bits that
do include the ones the kernel uses (for contested slots).  Of course it
works as well to use either bit for this, as long as you're consistent.
But as I've said at least twice already, there is no actual meaning
whatsoever to choosing one enable bit over the other.  It's just confusing
and misleading to have the code make special efforts to set one rather than
the other for different cases.  You talk about them as if they meant
something, which keeps making me wonder if you're confused.  Since the
hardware doesn't care which bit you set, you could overload them to record
a bit and a half of information there if really wanted to, but you're not
even doing that, unless I'm confused.

> Maybe.  I always had in the back of my mind the possibility that there
> might be a user I/O breakpoint set.  It could be triggered by an interrupt
> handler even in the SIGKILL case.  But since we're not supporting I/O
> breakpoints now, that's a moot point.

How would that happen?  This would mean that some user process has been
allowed to enable ioperm for some io port that kernel drivers also send to
from interrupt handlers.  Can that ever happen?

> Actually the code _doesn't_ already know what's there; the chbi area
> doesn't include any storage for the kernel DR7 value.  I figured it was at
> least as easy to read it from the CPU register as to read it from memory.  
> But maybe that's not true; according to my ancient processor manual, moves
> to/from debug registers take many more clock cycles than moves to/from
> memory.

The purpose of the chbi area is to optimize this path.  Make it store
whatever precomputed values are most convenient for the hot paths.  This
path doesn't need num_kbps, it needs kdr7.  So precompute that and do that
one load, instead of a load of chbi->num_bkps we don't otherwise need plus
a load from kdr7_masks that can be avoided altogether on hot paths.

I don't really know about the slowness of reading debug registers, though I
would guess it is slower than most common operations.  But regardless, you
can avoid it because kdr7 is something you need anyway, so you're not
replacing it with a load but letting a load you already had kill two birds.

> No.  If a debugger has removed some user breakpoints since the last time
> the thread ran, the chbi->bps[] entries could still be present.  Likewise
> if the previously-running task had more breakpoints than the current one.

I don't really get why user breakpoints would be in chbi->bps at all.
When a debug trap hits, you can check kdr7 or whatnot to see if it was a
kernel allocation, and otherwise look in current->thbi->bps to find it.

> I don't like using DR_LEN_1, because it would force asm/debugreg.h to be 
> #included by any user of hw_breakpoint.  The raw numerical value should do 
> just as well.

Agreed.  (I just used DR_LEN_1 as shorthand and was not hot on including
asm/debugreg.h in asm/hw_breakpoint.h in the actual version.)

> > On powerpc, the address breakpoint is always for an 8-byte address range.
> 
> So there's no way to trap on accesses to a particular byte within a
> string?

There's no way to tell which of the 8 bytes were accessed, AFAIK.  It's the
same as LEN8 on x86_64 or LEN[42] on i386: some byte in there was accessed.

> Better yet, if type is HW_BREAKPOINT_TYPE_EXECUTE then just ignore the
> caller's len and always use the correct value.

That is probably fine too.

> Since PPC doesn't allow lengths shorter than 8, perhaps on that 
> architecture the bottom bits of the address should silently be cleared.

You're not suggesting this for lengths 2, 4, or 8 on i386/x86_64, and I
don't see the distinction.  In all those cases, any low bits set in the
address are being ignored.  I think it is much better to enforce the
alignment so that callers are told explicitly what their parameteres really
mean.  A caller passing an unaligned address with DR_LEN_4 might be
thinking it will catch the four bytes starting at that unaligned byte,
which is not true.

> It gives drivers a way to tell whether or not the breakpoint is currently 
> installed without having to do explicit tracking of installed() and 
> uninstalled() callbacks.

How could that ever be used that would not be racy and thus buggy?  A
registration call on another CPU could cause a change and callback just
after you fetched the value.

> These changes to the API sound pretty good.  Stay tuned for the next
> version...

You keep rewriting it and I'll keep changing my mind!  
(Just kidding, but fair warning. ;-)
I look forward to it.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-14  3:00                                       ` Roland McGrath
@ 2007-03-14 19:11                                         ` Alan Stern
  2007-03-28 21:39                                           ` Roland McGrath
  2007-03-16 21:07                                         ` Alan Stern
  2007-03-22 19:44                                         ` Alan Stern
  2 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-03-14 19:11 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 13 Mar 2007, Roland McGrath wrote:

> > Yes, the code could be reworked by moving some of the data from the CPU
> > hw-breakpoint info into the thread's info.  I'll see how much simpler it
> > ends up being.
> 
> I don't quite understand that characterization of the kind of change I'm
> advocating.

It's easy to explain: Your sample code had a tdr[] array in the thread's
hw_breakpoint area and my patch did not; adding it amounts to moving some
of the data from the CPU's info into the thread's info.

> > It isn't quite that easy.  Even though the number of user breakpoints may
> > not have changed, their identities may have.  So the unlikely case has to
> > encompass two possibilities: the number of installable user breakpoints
> > has changed, or any user breakpoints have been registered or unregistered.
> 
> Why does it matter?  When a new user breakpoint was made the
> highest-priority one, it ought to update tdr[0..3] right then before the
> registration call returns.  It seems fine to me for it to make an
> uninstalled callback right away rather than at the thread's next switch-in.
> But even if you wanted to delay it, you could just set active_dr7 to zero
> or something so that the unlikely case triggers.

That's basically what I intend to do.  Although instead of keeping track 
of an active_dr7, I'll keep track of num_kbps as of the last time the 
thread ran.  The unlikely case can be triggered by setting the value to 
-1.

> The plan I suggested relies on setting want_dr7 with the enable bits that
> do include the ones the kernel uses (for contested slots).  Of course it
> works as well to use either bit for this, as long as you're consistent.

With my scheme it won't matter.  You'll see.

> But as I've said at least twice already, there is no actual meaning
> whatsoever to choosing one enable bit over the other.  It's just confusing
> and misleading to have the code make special efforts to set one rather than
> the other for different cases.  You talk about them as if they meant
> something, which keeps making me wonder if you're confused.  Since the
> hardware doesn't care which bit you set, you could overload them to record
> a bit and a half of information there if really wanted to, but you're not
> even doing that, unless I'm confused.

No, I'm not confused and neither are you.  I realize there's no functional 
difference between the two sets of enable bits, since Linux doesn't use 
hardware task-switching.  I just like to keep things neatly separated, 
that's all.


> > Maybe.  I always had in the back of my mind the possibility that there
> > might be a user I/O breakpoint set.  It could be triggered by an interrupt
> > handler even in the SIGKILL case.  But since we're not supporting I/O
> > breakpoints now, that's a moot point.
> 
> How would that happen?  This would mean that some user process has been
> allowed to enable ioperm for some io port that kernel drivers also send to
> from interrupt handlers.  Can that ever happen?

I haven't checked the ioperm code to be certain, but it seems like the 
sort of thing somebody might want to do on occasion.

Another aspect perhaps worth mentioning is that user breakpoints are 
active only when the task is running.  Hence breakpoints for user data 
affected by AIO operations won't necessarily be triggered; with async I/O 
the transfers can occur while a different task is running.  Of course 
there's nothing new about this; the same has been true for ptrace all 
along.


> > Actually the code _doesn't_ already know what's there; the chbi area
> > doesn't include any storage for the kernel DR7 value.  I figured it was at
> > least as easy to read it from the CPU register as to read it from memory.  
> > But maybe that's not true; according to my ancient processor manual, moves
> > to/from debug registers take many more clock cycles than moves to/from
> > memory.
> 
> The purpose of the chbi area is to optimize this path.  Make it store
> whatever precomputed values are most convenient for the hot paths.  This
> path doesn't need num_kbps, it needs kdr7.  So precompute that and do that
> one load, instead of a load of chbi->num_bkps we don't otherwise need plus
> a load from kdr7_masks that can be avoided altogether on hot paths.

I will endeavor to optimize switch_to_thread_hw_breakpoint.  However it 
will turn out that the hot patch really does to use chbi->num_kbps and 
chbi->kdr7_mask.  Again, you'll see...


> > No.  If a debugger has removed some user breakpoints since the last time
> > the thread ran, the chbi->bps[] entries could still be present.  Likewise
> > if the previously-running task had more breakpoints than the current one.
> 
> I don't really get why user breakpoints would be in chbi->bps at all.
> When a debug trap hits, you can check kdr7 or whatnot to see if it was a
> kernel allocation, and otherwise look in current->thbi->bps to find it.

You're right; I'll do it that way.


> > Since PPC doesn't allow lengths shorter than 8, perhaps on that 
> > architecture the bottom bits of the address should silently be cleared.
> 
> You're not suggesting this for lengths 2, 4, or 8 on i386/x86_64, and I
> don't see the distinction.  In all those cases, any low bits set in the
> address are being ignored.  I think it is much better to enforce the
> alignment so that callers are told explicitly what their parameteres really
> mean.  A caller passing an unaligned address with DR_LEN_4 might be
> thinking it will catch the four bytes starting at that unaligned byte,
> which is not true.

Okay, I'll check the address bits.


> > It gives drivers a way to tell whether or not the breakpoint is currently 
> > installed without having to do explicit tracking of installed() and 
> > uninstalled() callbacks.
> 
> How could that ever be used that would not be racy and thus buggy?  A
> registration call on another CPU could cause a change and callback just
> after you fetched the value.

Not if you have interrupts disabled.  Debug register settings are 
disseminated from one CPU to the others by means of an IPI.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-14  3:00                                       ` Roland McGrath
  2007-03-14 19:11                                         ` Alan Stern
@ 2007-03-16 21:07                                         ` Alan Stern
  2007-03-22 19:44                                         ` Alan Stern
  2 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-03-16 21:07 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Roland:

Here's the next update.  I haven't tried testing it yet; this is just to
get your opinion.

I implemented most of the changes we discussed.  Ignoring the length for 
execute breakpoints turned out not to be a good idea because it would 
affect the way ptrace works, so the code verifies it just like any other 
kind of breakpoint.

I also decided against adding a .bits member.  It doesn't really gain very 
much; the savings in encoding the breakpoint values is trivial -- one line 
of code on i386.  And it helps to have the original length and type values 
available for use by the ptrace routines.  In fact, I decided to add a 
superfluous bit to the type code.  That's to help disambiguate between 
length and type values; it's easy to mix the two of them up.  Likewise, 
the length macros don't give the encoded values; that's so people can just 
specify the length directly instead of using the macro.

There's a small question about the value of the error_code argument for 
send_sigtrap().  The value passed into do_debug() isn't available in
ptrace_triggered() -- but since it is always 0, that's what I'm using.  
I'm not sure what it's supposed to mean anyway.

Anyway, this version seems to be a fair amount cleaner than the previous.  
See what you think.

Alan Stern


Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,17 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* Available HW breakpoint lengths */
+#define HW_BREAKPOINT_LEN_1	1
+#define HW_BREAKPOINT_LEN_2	2
+#define HW_BREAKPOINT_LEN_4	4
+#define HW_BREAKPOINT_LEN_EXECUTE	1
+
+/* Available HW breakpoint types */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/pda.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,17 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -430,14 +434,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -467,7 +478,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -479,18 +491,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -540,16 +552,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -682,7 +684,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -714,6 +716,13 @@ struct task_struct fastcall * __switch_t
 
 	write_pda(pcurrent, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -592,13 +592,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -807,62 +807,47 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	struct die_args args;
 
-	get_debugreg(condition, 6);
-
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	args.regs = regs;
+	args.str = "debug";
+	get_debugreg(args.err, 6);
+	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
+	args.trapnr = error_code;
+	args.signr = SIGTRAP;
+	if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) ==
+			NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((args.err & DR_STEP) && !user_mode(regs)) {
+		args.err &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~TF_MASK;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 |= args.err;
 
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (args.err & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -58,7 +60,31 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
-#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
+#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
+#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */
+
+
+/*
+ * HW breakpoint additions
+ */
+
+#include <asm/hw_breakpoint.h>
+#include <linux/spinlock.h>
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
 
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -402,8 +402,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,1212 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	Error code in ptrace_triggered?
+
+	Set RF flag bit for execution faults?
+
+	TF flag bit for single-step exceptions in kernel space?
+
+	CPU hotplug, kexec, etc?
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm-generic/percpu.h>
+
+#include <asm/debugreg.h>
+#include <asm/kdebug.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	int			last_num_kbps;	/* Value of num_kbps when
+						 *  the thread last ran */
+
+	/* ptrace support -- note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			* representing virtualized debug registers 0 - 3 */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+	int			num_kbps;	/* Number of kernel bps */
+	unsigned long		kdr7;		/* Current kernel DR7 value */
+	unsigned long		kdr7_mask;	/* Mask for kernel part */
+	unsigned long		dr7;		/* Current DR7 value */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Kernel-space breakpoint data */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static int			num_kbps;	/* Number of kernel bps */
+static unsigned long		kdr7;		/* Kernel DR7 value */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0203,	/* LEN0, R/W0, GE, G0, L0 */
+	0x00ff020f,	/* Same for 0,1 */
+	0x0fff023f,	/* Same for 0,1,2 */
+	0xffff02ff	/* Same for 0,1,2,3 */
+};
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+	unsigned long flags;
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this same time, so we can't use the global
+	 * value stored in num_kbps.  Instead we'll use the per-CPU
+	 * value stored in cpu_info. */
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the number of registers
+	 * allocated to the kernel has changed or any user breakpoints
+	 * have been registered or unregistered, we need to send out
+	 * some notifications. */
+	if (unlikely(thbi->last_num_kbps != chbi->num_kbps)) {
+		struct hw_breakpoint *bp;
+		int i = HB_NUM;
+
+		thbi->last_num_kbps = chbi->num_kbps;
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		rcu_read_lock();
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < chbi->num_kbps) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+		rcu_read_unlock();
+	}
+
+	/* Install the thread breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up, and there are num_kbps of them.
+	 * Thread breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for. */
+	switch (chbi->num_kbps) {
+	case 0:
+		set_debugreg(thbi->tdr[3], 0);
+	case 1:
+		set_debugreg(thbi->tdr[2], 1);
+	case 2:
+		set_debugreg(thbi->tdr[1], 2);
+	case 3:
+		set_debugreg(thbi->tdr[0], 3);
+	}
+
+	/* Mask in the parts of DR7 that refer to the new thread */
+	chbi->dr7 = chbi->kdr7 | (~chbi->kdr7_mask & thbi->tdr7);
+	set_debugreg(chbi->dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+	unsigned long flags;
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	chbi->bp_task = NULL;
+	chbi->dr7 = chbi->kdr7;
+	set_debugreg(chbi->dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void switch_kernel_hw_breakpoint(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->num_kbps = num_kbps;
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	i = 0;
+	list_for_each_entry(bp, &kernel_bps, node) {
+		if (i >= chbi->num_kbps)
+			break;
+		chbi->bps[i] = bp;
+		switch (i) {
+		case 0:
+			set_debugreg(bp->address.va, 0);
+			break;
+		case 1:
+			set_debugreg(bp->address.va, 1);
+			break;
+		case 2:
+			set_debugreg(bp->address.va, 2);
+			break;
+		case 3:
+			set_debugreg(bp->address.va, 3);
+			break;
+		}
+		++i;
+	}
+
+	chbi->kdr7_mask = kdr7_masks[chbi->num_kbps];
+	chbi->kdr7 = kdr7 & chbi->kdr7_mask;
+	set_debugreg(chbi->kdr7, 7);
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	/* Install both the kernel and the user breakpoints */
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	switch_kernel_hw_breakpoint(chbi);
+	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+		switch_to_thread_hw_breakpoint(tsk);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	update_this_cpu(NULL);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = 0; i < HB_NUM && thbi->bps[i]; ++i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio. */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps. */
+	k = u = 0;
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k + u < HB_NUM) {
+		if (k == num_kbps || tprio[u] >= bp->priority)
+			++u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != num_kbps) {
+		changed = 1;
+		num_kbps = k;
+	}
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled. */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		k = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (k++ >= num_kbps)
+				break;
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE? */
+
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = (unsigned long)
+					thbi->vdr_bps[i].address.va;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+
+	switch (bp->type) {
+	case HW_BREAKPOINT_EXECUTE:
+		if (bp->len != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+	case HW_BREAKPOINT_WRITE:
+	case HW_BREAKPOINT_RW:
+		break;
+	default:
+		return rc;
+	}
+
+	switch (bp->len) {
+	case 1:  case 2:  case 4:	/* 8 is also valid on x86_64 */
+		break;
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len. */
+	if (bp->address.va & (bp->len - 1))
+		return rc;
+
+	/* Check that the address is in the proper range.  Note that tsk
+	 * is NULL for kernel bps and non-NULL for user bps.
+	 * With x86_64, use TASK_SIZE_OF(tsk) instead of TASK_SIZE. */
+	if ((tsk != NULL) != (bp->address.va < TASK_SIZE))
+		return rc;
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type, int local)
+{
+	unsigned long temp;
+
+	/* For x86_64:
+	 *
+	 * if (len == 8)
+	 *	len = 3;
+	 */
+	temp = ((len - 1) << 2) | (type & 0x7f);
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	if (local)
+		temp |= (DR_LOCAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_LOCAL_EXACT;
+	else
+		temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_EXACT;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct list_head *bp_list, int is_user)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later. */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		drnum = (is_user ? HB_NUM - 1 - i : i);
+		dr7 |= encode_dr7(drnum, bp->len, bp->type, is_user);
+		if (++i >= HB_NUM)
+			break;
+	}
+	return dr7;
+}
+
+/*
+ * Update the DR7 value for a user thread.
+ */
+static void update_user_dr7(struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = 0;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		thbi->tdr[i] = bp->address.va;
+		if (++i >= HB_NUM)
+			break;
+	}
+	for (; i < HB_NUM; ++i)
+		thbi->bps[i] = NULL;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ *
+ * The caller must hold thbi->lock.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	list_add_tail_rcu(&bp->node, &temp_bp->node);
+	bp->status = HW_BREAKPOINT_REGISTERED;
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(head)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+		if (tsk != current)
+			synchronize_rcu();
+	}
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ *
+ * The caller must hold thbi->lock.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid. */
+	list_del_rcu(&bp->node);
+	store_thread_bp_array(thbi);
+
+	if (tsk) {
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+		if (tsk != current)
+			synchronize_rcu();
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Force an update notification */
+	thbi->last_num_kbps = -1;
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase. */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < HB_NUM - num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Force an update notification */
+	thbi->last_num_kbps = -1;
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary. */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Actual implementation of modify_user_hw_breakpoint.
+ */
+int __modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	if (!bp->status) {	/* Not registered, just store the values */
+		bp->address.user = address;
+		bp->len = len;
+		bp->type = type;
+		return 0;
+	}
+
+	/* Check the new values */
+	{
+		struct hw_breakpoint temp_bp = *bp;
+		int rc;
+
+		temp_bp.address.user = address;
+		temp_bp.len = len;
+		temp_bp.type = type;
+		rc = validate_settings(&temp_bp, tsk);
+		if (rc)
+			return rc;
+	}
+
+	/* Okay, update the breakpoint */
+	bp->address.user = address;
+	bp->len = len;
+	bp->type = type;
+	update_user_dr7(thbi);
+
+	for (i = 0; i < HB_NUM; ++i) {
+		if (thbi->bps[i] == bp)
+			thbi->tdr[i] = bp->address.va;
+	}
+
+	/* The priority hasn't changed so we don't need to rebalance
+	 * anything.  Just install the new breakpoint, if necessary. */
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+	return 0;
+}
+
+/**
+ * modify_user_hw_breakpoint - modify a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to modify
+ * @address: the new value for @bp->address
+ * @len: the new value for @bp->len
+ * @type: the new value for @bp->type
+ *
+ * @bp need not currently be registered.  If it isn't, the new values
+ * are simply stored in it and @tsk is ignored.  Otherwise the new values
+ * are validated first and then stored.  If @tsk is the current process
+ * and @bp is installed in a debug register, the register is updated.
+ *
+ * Returns 0 if the new values are acceptable, otherwise a negative error
+ * number.
+ */
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __modify_user_hw_breakpoint(tsk, bp, address, len, type);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Update the DR7 value for the kernel.
+ */
+static void update_kernel_dr7(void)
+{
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that breakpoint i
+	 * was hit, so the thread's debugger will see it, and send the
+	 * debugging signal. */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+		send_sigtrap(tsk, regs, 0);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address.va;
+	} else if (n == 6)
+		val = tsk->thread.vdr6;
+	else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+	int tlen = 1 + ((temp >> 2) & 0x3);
+
+	/* For x86_64:
+	 *
+	 * if (tlen == 3)
+	 *	tlen = 8;
+	 */
+	*len = tlen;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints,
+	 * making the appropriate changes to each. */
+restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint if it should now be disabled.
+		 * Do this first so that setting invalid values for len
+		 * or type won't cause an error. */
+		if (!enabled && bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's settings.  If the bp is enabled,
+		 * an invalid entry will cause an error. */
+		if (__modify_user_hw_breakpoint(tsk, bp,
+				bp->address.user, len, type) < 0 && rc == 0)
+			break;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will cause an error here. */
+		if (enabled && !bp->status) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		if (__modify_user_hw_breakpoint(tsk, bp, (void *) val,
+				bp->len, bp->type) >= 0)
+			rc = 0;
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_handler(struct die_args *data)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi;
+
+	/* The value of DR6 is stored in data->err */
+#define DR6	(data->err)
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (chbi->bp_task != current && chbi->bp_task != NULL) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch. */
+		switch_to_none_hw_breakpoint();
+		thbi = NULL;
+	} else
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions. */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (!(DR6 & (DR_TRAP0 << i)))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback. */
+		if (i < chbi->num_kbps)
+			bp = chbi->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp)			/* Should always be non-NULL */
+			(bp->triggered)(bp, data->regs);
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(chbi->dr7, 7);
+	put_cpu_no_resched();
+
+	/* Mask away the bits we have handled */
+	DR6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Early exit from the notifier chain if everything has been handled */
+	if (data->err == 0)
+		return NOTIFY_STOP;
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -383,11 +383,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -417,59 +417,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -625,7 +577,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -45,6 +46,11 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	/*
+	 * disable the debug registers
+	 */
+	set_debugreg(0, 7);
 }
 
 void save_processor_state(void)
@@ -69,20 +75,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -670,8 +670,11 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+			args->err &= ~DR_STEP;
+			if (args->err == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,200 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ *
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.  @len is the
+ * breakpoint's extent in bytes, which is subject to certain limitations.
+ * include/asm/hw_breakpoint.h contains macros defining the available
+ * lengths for a specific architecture.  Note that @len must be a power
+ * of 2, and @address must have the alignment specified by @len.  The
+ * breakpoint will catch accesses to any byte in the range from @address
+ * to @address + (@len - 1).
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_IO (triggered on I/O space access),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h.
+ * Execute breakpoints must have @len equal to the special value
+ * %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * In register_user_hw_breakpoint() and modify_user_hw_breakpoint(),
+ * @address must refer to a location in user space (use @address.user).
+ * The breakpoint will be active only while the requested task is running.
+ * Conversely, in register_kernel_hw_breakpoint() @address must refer to a
+ * location in kernel space (use @address.kernel), and the breakpoint will
+ * be active on all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; all other types of trap occur after the
+ * memory access has taken place.  All breakpoints are disabled while
+ * @triggered runs, to avoid recursive traps and allow unhindered access
+ * to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed (provided the parameters are valid),
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * The @address, @len, and @type fields in a user-space breakpoint can be
+ * changed by calling modify_user_hw_breakpoint().  Kernel-space
+ * breakpoints cannot be modified, nor can the @priority value in
+ * user-space breakpoints, after the breakpoint has been registered.  And
+ * of course all the fields in a %hw_breakpoint structure should be
+ * treated as read-only while the breakpoint is registered.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address.kernel = &pid_max;
+ * 	my_bp.type = HW_BREAKPOINT_WRITE;
+ * 	my_bp.len = HW_BREAKPOINT_LEN_4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * 	rc = register_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	union {
+		const void		*kernel;
+		const void __user	*user;
+		unsigned long		va;
+	}		address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/* len and type values are defined in include/asm/hw_breakpoint.h */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following three routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-14  3:00                                       ` Roland McGrath
  2007-03-14 19:11                                         ` Alan Stern
  2007-03-16 21:07                                         ` Alan Stern
@ 2007-03-22 19:44                                         ` Alan Stern
  2 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-03-22 19:44 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Roland:

Here's the most recent version of the hw-breakpoint patch.  Unlike the 
version I posted last week, this one actually works with 2.6.21-rc4.

Alan Stern


Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,17 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* Available HW breakpoint lengths */
+#define HW_BREAKPOINT_LEN_1	1
+#define HW_BREAKPOINT_LEN_2	2
+#define HW_BREAKPOINT_LEN_4	4
+#define HW_BREAKPOINT_LEN_EXECUTE	1
+
+/* Available HW breakpoint types */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/pda.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,17 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -430,14 +434,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -467,7 +478,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -479,18 +491,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -540,16 +552,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -682,7 +684,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -714,6 +716,13 @@ struct task_struct fastcall * __switch_t
 
 	write_pda(pcurrent, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -592,13 +592,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -807,62 +807,47 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	struct die_args args;
 
-	get_debugreg(condition, 6);
-
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	args.regs = regs;
+	args.str = "debug";
+	get_debugreg(args.err, 6);
+	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
+	args.trapnr = error_code;
+	args.signr = SIGTRAP;
+	if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) ==
+			NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((args.err & DR_STEP) && !user_mode(regs)) {
+		args.err &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~TF_MASK;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 |= args.err;
 
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (args.err & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -58,7 +60,31 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
-#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
+#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
+#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */
+
+
+/*
+ * HW breakpoint additions
+ */
+
+#include <asm/hw_breakpoint.h>
+#include <linux/spinlock.h>
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
 
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -402,8 +402,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,1211 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	Error code in ptrace_triggered?
+
+	Set RF flag bit for execution faults?
+
+	TF flag bit for single-step exceptions in kernel space?
+
+	CPU hotplug, kexec, etc?
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm-generic/percpu.h>
+
+#include <asm/debugreg.h>
+#include <asm/kdebug.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	int			last_num_kbps;	/* Value of num_kbps when
+						 *  the thread last ran */
+
+	/* ptrace support -- note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			* representing virtualized debug registers 0 - 3 */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+	int			num_kbps;	/* Number of kernel bps */
+	unsigned long		kdr7;		/* Current kernel DR7 value */
+	unsigned long		kdr7_mask;	/* Mask for kernel part */
+	unsigned long		dr7;		/* Current DR7 value */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Kernel-space breakpoint data */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static int			num_kbps;	/* Number of kernel bps */
+static unsigned long		kdr7;		/* Kernel DR7 value */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0203,	/* LEN0, R/W0, GE, G0, L0 */
+	0x00ff020f,	/* Same for 0,1 */
+	0x0fff023f,	/* Same for 0,1,2 */
+	0xffff02ff	/* Same for 0,1,2,3 */
+};
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+	unsigned long flags;
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this same time, so we can't use the global
+	 * value stored in num_kbps.  Instead we'll use the per-CPU
+	 * value stored in cpu_info. */
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the number of registers
+	 * allocated to the kernel has changed or any user breakpoints
+	 * have been registered or unregistered, we need to send out
+	 * some notifications. */
+	if (unlikely(thbi->last_num_kbps != chbi->num_kbps)) {
+		struct hw_breakpoint *bp;
+		int i = HB_NUM;
+
+		thbi->last_num_kbps = chbi->num_kbps;
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		rcu_read_lock();
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < chbi->num_kbps) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+		rcu_read_unlock();
+	}
+
+	/* Install the thread breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up, and there are num_kbps of them.
+	 * Thread breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for. */
+	switch (chbi->num_kbps) {
+	case 0:
+		set_debugreg(thbi->tdr[0], 0);
+	case 1:
+		set_debugreg(thbi->tdr[1], 1);
+	case 2:
+		set_debugreg(thbi->tdr[2], 2);
+	case 3:
+		set_debugreg(thbi->tdr[3], 3);
+	}
+
+	/* Mask in the parts of DR7 that refer to the new thread */
+	chbi->dr7 = chbi->kdr7 | (~chbi->kdr7_mask & thbi->tdr7);
+	set_debugreg(chbi->dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+	unsigned long flags;
+
+	/* Block kernel breakpoint updates from other CPUs */
+	local_irq_save(flags);
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	chbi->bp_task = NULL;
+	chbi->dr7 = chbi->kdr7;
+	set_debugreg(chbi->dr7, 7);
+
+	put_cpu_no_resched();
+	local_irq_restore(flags);
+}
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void switch_kernel_hw_breakpoint(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->num_kbps = num_kbps;
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	i = 0;
+	list_for_each_entry(bp, &kernel_bps, node) {
+		if (i >= chbi->num_kbps)
+			break;
+		chbi->bps[i] = bp;
+		switch (i) {
+		case 0:
+			set_debugreg(bp->address.va, 0);
+			break;
+		case 1:
+			set_debugreg(bp->address.va, 1);
+			break;
+		case 2:
+			set_debugreg(bp->address.va, 2);
+			break;
+		case 3:
+			set_debugreg(bp->address.va, 3);
+			break;
+		}
+		++i;
+	}
+
+	chbi->kdr7_mask = kdr7_masks[chbi->num_kbps];
+	chbi->kdr7 = kdr7 & chbi->kdr7_mask;
+	set_debugreg(chbi->kdr7, 7);
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	/* Install both the kernel and the user breakpoints */
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	switch_kernel_hw_breakpoint(chbi);
+	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+		switch_to_thread_hw_breakpoint(tsk);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	update_this_cpu(NULL);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.  Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio. */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps. */
+	k = 0;			/* Next kernel bp to allocate */
+	u = HB_NUM - 1;		/* Next user bp to allocate */
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k <= u) {
+		if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+			--u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != num_kbps) {
+		changed = 1;
+		num_kbps = k;
+	}
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled. */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		k = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (k++ >= num_kbps)
+				break;
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE? */
+
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = (unsigned long)
+					thbi->vdr_bps[i].address.va;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+
+	switch (bp->type) {
+	case HW_BREAKPOINT_EXECUTE:
+		if (bp->len != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+	case HW_BREAKPOINT_WRITE:
+	case HW_BREAKPOINT_RW:
+		break;
+	default:
+		return rc;
+	}
+
+	switch (bp->len) {
+	case 1:  case 2:  case 4:	/* 8 is also valid on x86_64 */
+		break;
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len. */
+	if (bp->address.va & (bp->len - 1))
+		return rc;
+
+	/* Check that the address is in the proper range.  Note that tsk
+	 * is NULL for kernel bps and non-NULL for user bps.
+	 * With x86_64, use TASK_SIZE_OF(tsk) instead of TASK_SIZE. */
+	if ((tsk != NULL) != (bp->address.va < TASK_SIZE))
+		return rc;
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type, int local)
+{
+	unsigned long temp;
+
+	/* For x86_64:
+	 *
+	 * if (len == 8)
+	 *	len = 3;
+	 */
+	temp = ((len - 1) << 2) | (type & 0x7f);
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	if (local)
+		temp |= (DR_LOCAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_LOCAL_EXACT;
+	else
+		temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_EXACT;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct list_head *bp_list, int is_user)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later. */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		drnum = (is_user ? HB_NUM - 1 - i : i);
+		dr7 |= encode_dr7(drnum, bp->len, bp->type, is_user);
+		if (++i >= HB_NUM)
+			break;
+	}
+	return dr7;
+}
+
+/*
+ * Update the DR7 value for a user thread.
+ */
+static void update_user_dr7(struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = HB_NUM - 1;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		thbi->tdr[i] = bp->address.va;
+		if (--i < 0)
+			break;
+	}
+	while (i >= 0)
+		thbi->bps[i--] = NULL;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	list_add_tail_rcu(&bp->node, &temp_bp->node);
+	bp->status = HW_BREAKPOINT_REGISTERED;
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(&thbi->node)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+		if (tsk != current)
+			synchronize_rcu();
+	}
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid. */
+	list_del_rcu(&bp->node);
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+		if (tsk != current)
+			synchronize_rcu();
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Force an update notification */
+	thbi->last_num_kbps = -1;
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase. */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < HB_NUM - num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	update_user_dr7(thbi);
+
+	/* Force an update notification */
+	thbi->last_num_kbps = -1;
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary. */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Actual implementation of modify_user_hw_breakpoint.
+ */
+int __modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	if (!bp->status) {	/* Not registered, just store the values */
+		bp->address.user = address;
+		bp->len = len;
+		bp->type = type;
+		return 0;
+	}
+
+	/* Check the new values */
+	{
+		struct hw_breakpoint temp_bp = *bp;
+		int rc;
+
+		temp_bp.address.user = address;
+		temp_bp.len = len;
+		temp_bp.type = type;
+		rc = validate_settings(&temp_bp, tsk);
+		if (rc)
+			return rc;
+	}
+
+	/* Okay, update the breakpoint */
+	bp->address.user = address;
+	bp->len = len;
+	bp->type = type;
+	update_user_dr7(thbi);
+
+	for (i = 0; i < HB_NUM; ++i) {
+		if (thbi->bps[i] == bp)
+			thbi->tdr[i] = bp->address.va;
+	}
+
+	/* The priority hasn't changed so we don't need to rebalance
+	 * anything.  Just install the new breakpoint, if necessary. */
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+	return 0;
+}
+
+/**
+ * modify_user_hw_breakpoint - modify a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to modify
+ * @address: the new value for @bp->address
+ * @len: the new value for @bp->len
+ * @type: the new value for @bp->type
+ *
+ * @bp need not currently be registered.  If it isn't, the new values
+ * are simply stored in it and @tsk is ignored.  Otherwise the new values
+ * are validated first and then stored.  If @tsk is the current process
+ * and @bp is installed in a debug register, the register is updated.
+ *
+ * Returns 0 if the new values are acceptable, otherwise a negative error
+ * number.
+ */
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __modify_user_hw_breakpoint(tsk, bp, address, len, type);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Update the DR7 value for the kernel.
+ */
+static void update_kernel_dr7(void)
+{
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register. */
+	if (pos < num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	update_kernel_dr7();
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register. */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that breakpoint i
+	 * was hit, so the thread's debugger will see it, and send the
+	 * debugging signal. */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+		send_sigtrap(tsk, regs, 0);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address.va;
+	} else if (n == 6)
+		val = tsk->thread.vdr6;
+	else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+	int tlen = 1 + ((temp >> 2) & 0x3);
+
+	/* For x86_64:
+	 *
+	 * if (tlen == 3)
+	 *	tlen = 8;
+	 */
+	*len = tlen;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints,
+	 * making the appropriate changes to each. */
+restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint if it should now be disabled.
+		 * Do this first so that setting invalid values for len
+		 * or type won't cause an error. */
+		if (!enabled && bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's settings.  If the bp is enabled,
+		 * an invalid entry will cause an error. */
+		if (__modify_user_hw_breakpoint(tsk, bp,
+				bp->address.user, len, type) < 0 && rc == 0)
+			break;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will cause an error here. */
+		if (enabled && !bp->status) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		if (__modify_user_hw_breakpoint(tsk, bp, (void *) val,
+				bp->len, bp->type) >= 0)
+			rc = 0;
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_handler(struct die_args *data)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi;
+
+	/* The value of DR6 is stored in data->err */
+#define DR6	(data->err)
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		thbi = NULL;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch. */
+		switch_to_none_hw_breakpoint();
+		thbi = NULL;
+	} else
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions. */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (!(DR6 & (DR_TRAP0 << i)))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback. */
+		if (i < chbi->num_kbps)
+			bp = chbi->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp)			/* Should always be non-NULL */
+			(bp->triggered)(bp, data->regs);
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(chbi->dr7, 7);
+	put_cpu_no_resched();
+
+	/* Mask away the bits we have handled */
+	DR6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Early exit from the notifier chain if everything has been handled */
+	if (data->err == 0)
+		return NOTIFY_STOP;
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -383,11 +383,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -417,59 +417,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -625,7 +577,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -45,6 +46,11 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	/*
+	 * disable the debug registers
+	 */
+	set_debugreg(0, 7);
 }
 
 void save_processor_state(void)
@@ -69,20 +75,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -670,8 +670,11 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+			args->err &= ~DR_STEP;
+			if (args->err == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,200 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ *
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.  @len is the
+ * breakpoint's extent in bytes, which is subject to certain limitations.
+ * include/asm/hw_breakpoint.h contains macros defining the available
+ * lengths for a specific architecture.  Note that @len must be a power
+ * of 2, and @address must have the alignment specified by @len.  The
+ * breakpoint will catch accesses to any byte in the range from @address
+ * to @address + (@len - 1).
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_IO (triggered on I/O space access),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h.
+ * Execute breakpoints must have @len equal to the special value
+ * %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * In register_user_hw_breakpoint() and modify_user_hw_breakpoint(),
+ * @address must refer to a location in user space (use @address.user).
+ * The breakpoint will be active only while the requested task is running.
+ * Conversely, in register_kernel_hw_breakpoint() @address must refer to a
+ * location in kernel space (use @address.kernel), and the breakpoint will
+ * be active on all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; all other types of trap occur after the
+ * memory access has taken place.  All breakpoints are disabled while
+ * @triggered runs, to avoid recursive traps and allow unhindered access
+ * to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed (provided the parameters are valid),
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * The @address, @len, and @type fields in a user-space breakpoint can be
+ * changed by calling modify_user_hw_breakpoint().  Kernel-space
+ * breakpoints cannot be modified, nor can the @priority value in
+ * user-space breakpoints, after the breakpoint has been registered.  And
+ * of course all the fields in a %hw_breakpoint structure should be
+ * treated as read-only while the breakpoint is registered.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address.kernel = &pid_max;
+ * 	my_bp.type = HW_BREAKPOINT_WRITE;
+ * 	my_bp.len = HW_BREAKPOINT_LEN_4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * 	rc = register_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	union {
+		const void		*kernel;
+		const void __user	*user;
+		unsigned long		va;
+	}		address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/* len and type values are defined in include/asm/hw_breakpoint.h */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following three routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+int modify_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp, const void __user *address,
+		u8 len, u8 type);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-14 19:11                                         ` Alan Stern
@ 2007-03-28 21:39                                           ` Roland McGrath
  2007-03-29 21:35                                             ` Alan Stern
                                                               ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Roland McGrath @ 2007-03-28 21:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

Sorry I've been slow in responding to your most recent version.
I fell into a large hole and couldn't get out until I fixed some bugs.

> No, I'm not confused and neither are you.  I realize there's no functional 
> difference between the two sets of enable bits, since Linux doesn't use 
> hardware task-switching.  I just like to keep things neatly separated, 
> that's all.

I don't really know what more to say.  I like things neatly separated too,
most especially between clearly meaningful and misleading yet apparently
meaningful.  Overloading unrelated hardware bits for no material purpose
will never make sense to me.

> > How would that happen?  This would mean that some user process has been
> > allowed to enable ioperm for some io port that kernel drivers also send to
> > from interrupt handlers.  Can that ever happen?
> 
> I haven't checked the ioperm code to be certain, but it seems like the 
> sort of thing somebody might want to do on occasion.

I checked the ioperm code.  As far as I can tell, if you have CAP_SYS_RAWIO
then you can use ioperm to enable any port you want, so this will always be
possible.  (That seems a bit nutty to me, hence my earlier reaction.)

> > > It gives drivers a way to tell whether or not the breakpoint is currently
> > > installed without having to do explicit tracking of installed() and 
> > > uninstalled() callbacks.
> > 
> > How could that ever be used that would not be racy and thus buggy?  A
> > registration call on another CPU could cause a change and callback just
> > after you fetched the value.
> 
> Not if you have interrupts disabled.  Debug register settings are 
> disseminated from one CPU to the others by means of an IPI.

But the callback is not per-CPU, it is per-registration.  When a new
registration displaces an old one, or a deregistration stops displacing
one, isn't the status change in the data structure and the callback made
right away, by the thread doing the registration/deregistration call?
Another CPU with interrupts disabled won't have its breakpoints actually
change, but the data structure will have changed.  Isn't that right?

> I implemented most of the changes we discussed.  Ignoring the length for 
> execute breakpoints turned out not to be a good idea because it would 
> affect the way ptrace works, so the code verifies it just like any other 
> kind of breakpoint.

I think that's perfectly fine.

> I also decided against adding a .bits member.  It doesn't really gain very 
> much; the savings in encoding the breakpoint values is trivial -- one line 
> of code on i386.  

I think this is a mistake.  On other machines, there is no need for more
than one field and .len will go unused.  There is no reason not to encode
ahead of time.  If it's useful for callers to be able to extract the type
and length from a struct hw_breakpoint, it's easy to define macros or
inlines in asm/hw_breakpoint.h that go the other way.

> And it helps to have the original length and type values available for
> use by the ptrace routines.  

There's no reason __modify_user_hw_breakpoint can't use an encoded value.
modify_user_hw_breakpoint can do checking and encoding before taking the lock.

> In fact, I decided to add a superfluous bit to the type code.  That's to
> help disambiguate between length and type values; it's easy to mix the
> two of them up.

It doesn't hurt to add some constant bits to the API values that you just
mask out (after checking) as part of the encoding convsersion.  In fact,
it's a good way to make sure people are using the right macros.

> Likewise, the length macros don't give the encoded values; that's so
> people can just specify the length directly instead of using the macro.

I object to this.  The purpose of having macros is to give a constrained
set of values that can be used at compile time.  Letting people use literal
numbers is just asking for people to write their calls unportably.  For
machines that support only LEN8, I'd planned to define it to some magic
bit pattern just so it could be checked and rule out cavalier users who
don't pay attention to the macros.

> There's a small question about the value of the error_code argument for 
> send_sigtrap().  The value passed into do_debug() isn't available in
> ptrace_triggered() -- but since it is always 0, that's what I'm using.  
> I'm not sure what it's supposed to mean anyway.

task->thread.trap_no and task->thread.error_code usually store the values
from the hardware trap frame when a trap is turned into a signal (e.g. see
do_trap).  This is the hardware trap number, which is 1 for do_debug.  The
error code is a word of the hardware trap frame whose meaning is specified
differently for each trap number.  For debug traps, it's always zero.
For other kinds of traps, it is sometimes useful to have the value.

My review comments below are about your patch of 3/22 (described as
"actually works with 2.6.21-rc4").

> -	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
> -					SIGTRAP) == NOTIFY_STOP)
> +	args.regs = regs;
> +	args.str = "debug";
> +	get_debugreg(args.err, 6);
> +	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
> +	args.trapnr = error_code;
> +	args.signr = SIGTRAP;
> +	if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) ==
> +			NOTIFY_STOP)

Put it back using notify_die, just pass dr6 instead of error_code.

> -#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
> -#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
> +#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
> +#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */

Don't mess with these names if there isn't a good reason.
It's unrelated to the work you're doing.

> +
> +/*
> + * HW breakpoint additions
> + */
> +
> +#include <asm/hw_breakpoint.h>
> +#include <linux/spinlock.h>

Put all this inside #ifdef __KERNEL__.

> + * Appropriate macros are defined in include/asm/hw_breakpoint.h.
> + * Execute breakpoints must have @len equal to the special value
> + * %HW_BREAKPOINT_LEN_EXECUTE.

This should be more explicit about it being machine-dependent what subset
of the macros will be defined.

> Execute-breakpoint traps occur before the
> + * breakpointed instruction runs; all other types of trap occur after the
> + * memory access has taken place.  

We need to say something about the restart behavior.  i.e., figure out the
situation with the x86 RF flag and what the story is on other machines that
have instruction breakpoint registers.

> + * Hardware breakpoints are implemented using the CPU's debug registers,
> + * which are a limited hardware resource.  Requests to register a
> + * breakpoint will always succeed (provided the parameters are valid),

I've said before and still maintain that there should be the option to have
a NULL installed callback and fail immediately if it can't be installed
right now.  The wording below about installed being NULL is not clear on
what this means.

> +/* QUESTIONS
> +
> +	Error code in ptrace_triggered?

Zero is right.

> +	Set RF flag bit for execution faults?

Yes, I don't think you'll ever progress if you haven't set it.  However,
setting it opens a can of worms.  Once you set it, the bit could leak into
a signal context, or be seen via ptrace, or stay set when ptrace changes
the pc, etc.  It requires some investigation.

> +	TF flag bit for single-step exceptions in kernel space?

What's the question?  I don't think hw_breakpoint has anything to do with
TF or single-step at all.

> +	CPU hotplug, kexec, etc?

Using register_cpu_notifier in an __init function seems to be what
everything else does, or you can hack __cpu_up directly.  For kexec,
clearing dr7 in machine_kexec seems like a good idea.

> Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c

Though you've split the header file into generic and i386, I see this is
all still in the i386 file.  I'd like to see the shareable code (list
handling, priority stuff, multi-CPU coordination) in a common file.  For
x86_64, all the machine-specific code is all but identical (literally just
one bit different), so it might as well actually share the i386 source
file.  But I'm keen to do the powerpc64 machine-dependent bits as soon as
your generic code is making it easy enough for me. :-)

I think the ia64 version will be straightforward too, though I won't do
that one myself.  A notable difference on these processors (and maybe all
other ones that have breakpoint registers) is that execute breakpoints and
data breakpoints are separate disjoint resources.  There may be zero
execute breakpoint registers or a few, and may be one data breakpoint
register or a few, but there is no single HB_NUM of how many breakpoint
slots for any kind whatever.

> +#include <asm-generic/percpu.h>

Surely this should be <asm/percpu.h>.

> +	/* Block kernel breakpoint updates from other CPUs */
> +	local_irq_save(flags);

I have a feeling this is more costly than we want, though I don't really
know.  It seems to me that things in struct cpu_hw_breakpoint are not
really per-CPU, except for bp_task.  They are "current global state",
right?  So I think it fits well to have the per_cpu struct contain just
bp_task and a pointer to the current state struct, and use RCU to replace
that struct when registrations change.  Perhaps rather than actually
updating other CPU's pointers in the RCU fashion, that would just be done
by each CPU for itself from the IPI handler that changes the hardware.
Still, it just requires rcu_read_lock+rcu_dereference at context switch.

> +	switch (chbi->num_kbps) {

Doesn't this unnecessarily do all four settings in a thread that only uses
one?  The old code did that too, but now it seems easy enough to cache the
number of active thread breakpoints and switch on that. 

> +	/* Mask in the parts of DR7 that refer to the new thread */
> +	chbi->dr7 = chbi->kdr7 | (~chbi->kdr7_mask & thbi->tdr7);
> +	set_debugreg(chbi->dr7, 7);

I'd say it's worth saving the loads here to recompute tdr7 with the
kernel-used bits masked out and cache it that way, so the fast path 
looks at only thbi->tdr7|chbi->kdr7.  

It does not seem necessary to cache the dr7 value.  Save the store.

	set_debugreg(chbi->kdr7 | thbi->tdr7, 7);

The only use of the saved value is in hw_breakpoint_handler,
which can juse use chbi->kdr7 | thbi->tdr7.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-28 21:39                                           ` Roland McGrath
@ 2007-03-29 21:35                                             ` Alan Stern
  2007-04-13 21:09                                             ` Alan Stern
  2007-05-11 15:25                                             ` Alan Stern
  2 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-03-29 21:35 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Wed, 28 Mar 2007, Roland McGrath wrote:

> Sorry I've been slow in responding to your most recent version.
> I fell into a large hole and couldn't get out until I fixed some bugs.

That's okay; the same thing happens to everyone from time to time.

> > No, I'm not confused and neither are you.  I realize there's no functional 
> > difference between the two sets of enable bits, since Linux doesn't use 
> > hardware task-switching.  I just like to keep things neatly separated, 
> > that's all.
> 
> I don't really know what more to say.  I like things neatly separated too,
> most especially between clearly meaningful and misleading yet apparently
> meaningful.  Overloading unrelated hardware bits for no material purpose
> will never make sense to me.

Well, I can change it easily enough.

> > > How would that happen?  This would mean that some user process has been
> > > allowed to enable ioperm for some io port that kernel drivers also send to
> > > from interrupt handlers.  Can that ever happen?
> > 
> > I haven't checked the ioperm code to be certain, but it seems like the 
> > sort of thing somebody might want to do on occasion.
> 
> I checked the ioperm code.  As far as I can tell, if you have CAP_SYS_RAWIO
> then you can use ioperm to enable any port you want, so this will always be
> possible.  (That seems a bit nutty to me, hence my earlier reaction.)

At this stage it's a moot point anyway.

> > > > It gives drivers a way to tell whether or not the breakpoint is currently
> > > > installed without having to do explicit tracking of installed() and 
> > > > uninstalled() callbacks.
> > > 
> > > How could that ever be used that would not be racy and thus buggy?  A
> > > registration call on another CPU could cause a change and callback just
> > > after you fetched the value.
> > 
> > Not if you have interrupts disabled.  Debug register settings are 
> > disseminated from one CPU to the others by means of an IPI.
> 
> But the callback is not per-CPU, it is per-registration.  When a new
> registration displaces an old one, or a deregistration stops displacing
> one, isn't the status change in the data structure and the callback made
> right away, by the thread doing the registration/deregistration call?
> Another CPU with interrupts disabled won't have its breakpoints actually
> change, but the data structure will have changed.  Isn't that right?

Not quite right.  The status change in the data structure isn't made until
after all the IPIs have completed, which won't happen until all the CPUs
have responded.  A CPU with interrupts disabled will therefore delay the
status change as well as the debug-register update, and so the value of
bp->status would remain valid until interrupts were enabled.

> > I also decided against adding a .bits member.  It doesn't really gain very 
> > much; the savings in encoding the breakpoint values is trivial -- one line 
> > of code on i386.  
> 
> I think this is a mistake.  On other machines, there is no need for more
> than one field and .len will go unused.  There is no reason not to encode
> ahead of time.  If it's useful for callers to be able to extract the type
> and length from a struct hw_breakpoint, it's easy to define macros or
> inlines in asm/hw_breakpoint.h that go the other way.

I'm not entirely convinced.

Consider first that the computation involved in encoding .bits is just a
little more complicated than one would like to do at compile time.  (At
least, it is on x86_64.)  Callers would have to do it themselves
dynamically, or else pass len and type as arguments to the
register_hw_breakpoint routines (which would mean an unused argument on
those other machines).  In general it just makes things harder on callers.

The fact the .len would go unused on some architectures shouldn't matter.  
It's just a u8; .len and .type together take up no more space than .bits 
would.

However, if you insist I can still change things over.

> > And it helps to have the original length and type values available for
> > use by the ptrace routines.  
> 
> There's no reason __modify_user_hw_breakpoint can't use an encoded value.
> modify_user_hw_breakpoint can do checking and encoding before taking the lock.

Come to think of it, we don't really need modify_user_hw_breakpoint at
all.  It could be replaced by an {unregister(old); register(new);}
sequence.  Unless you think there's some pressing reason to keep it, my
inclination is to do away with it.

> My review comments below are about your patch of 3/22 (described as
> "actually works with 2.6.21-rc4").
> 
> > -	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
> > -					SIGTRAP) == NOTIFY_STOP)
> > +	args.regs = regs;
> > +	args.str = "debug";
> > +	get_debugreg(args.err, 6);
> > +	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
> > +	args.trapnr = error_code;
> > +	args.signr = SIGTRAP;
> > +	if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) ==
> > +			NOTIFY_STOP)
> 
> Put it back using notify_die, just pass dr6 instead of error_code.

I'd like to.  But with notify_die, the DR6 value whose address is passed
to the notifier chain (args.err) is local to an inlined function.  
Modifications to that value made by the callout routines on the chain
would be lost when notify_die returns.

Hmm...  Maybe I could store a pointer to the DR6 value in args.err instead
of the value itself...

> > -#define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
> > -#define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
> > +#define DR_LOCAL_EXACT (0x100)       /* Local slow the pipeline */
> > +#define DR_GLOBAL_EXACT (0x200)      /* Global slow the pipeline */
> 
> Don't mess with these names if there isn't a good reason.
> It's unrelated to the work you're doing.

I was about to disagree, but a second look at the Intel documentation
shows that those bits don't have any real name (other than LE and GE,
which is why I changed the macros).

As I understand it, setting one of those bits is necessary on the 386 but
not necessary for later processors.  Should this be controlled by a
runtime (or compile time) check?  For that matter, do those bits have any
effect at all on a Pentium?

> > Execute-breakpoint traps occur before the
> > + * breakpointed instruction runs; all other types of trap occur after the
> > + * memory access has taken place.  
> 
> We need to say something about the restart behavior.  i.e., figure out the
> situation with the x86 RF flag and what the story is on other machines that
> have instruction breakpoint registers.

My Intel manual says that the CPU automatically sets the RF bit in the
EFLAGS image stored on the stack by the debug exception.  Hence the
handler doesn't have to worry about it.  That's why I removed it from the 
existing code.

> > + * Hardware breakpoints are implemented using the CPU's debug registers,
> > + * which are a limited hardware resource.  Requests to register a
> > + * breakpoint will always succeed (provided the parameters are valid),
> 
> I've said before and still maintain that there should be the option to have
> a NULL installed callback and fail immediately if it can't be installed
> right now.  The wording below about installed being NULL is not clear on
> what this means.

Setting a callback pointer to NULL generally means that you don't want or
care about callbacks.  Trying to make it mean something else will only
confuse people.

If callers want to give up when a kernel breakpoint isn't installed 
immediately, all they have to do is check the return value from 
register_kernel_hw_breakpoint and call unregister_kernel_hw_breakpoint.  
If you really want it, I could add an extra "fail if not installed" 
argument flag.

For user breakpoints, the whole notion is almost meaningless.  Even if the
breakpoint was allocated a debug register initially, it could get
displaced by the time the debuggee task next runs.

> > +	Set RF flag bit for execution faults?
> 
> Yes, I don't think you'll ever progress if you haven't set it.  However,
> setting it opens a can of worms.  Once you set it, the bit could leak into
> a signal context, or be seen via ptrace, or stay set when ptrace changes
> the pc, etc.  It requires some investigation.

I mentioned that question because I had removed the existing code, which
didn't seem to be necessary.  I wanted to make sure this was the correct
thing to do.

> > +	TF flag bit for single-step exceptions in kernel space?
> 
> What's the question?  I don't think hw_breakpoint has anything to do with
> TF or single-step at all.

Again, this was referring to existing code which I basically copied 
without fully understanding.  Does the new code in do_debug do the right 
thing with regard to TF?

> > Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
> 
> Though you've split the header file into generic and i386, I see this is
> all still in the i386 file.  I'd like to see the shareable code (list
> handling, priority stuff, multi-CPU coordination) in a common file.  For
> x86_64, all the machine-specific code is all but identical (literally just
> one bit different), so it might as well actually share the i386 source
> file.  But I'm keen to do the powerpc64 machine-dependent bits as soon as
> your generic code is making it easy enough for me. :-)
> 
> I think the ia64 version will be straightforward too, though I won't do
> that one myself.  A notable difference on these processors (and maybe all
> other ones that have breakpoint registers) is that execute breakpoints and
> data breakpoints are separate disjoint resources.  There may be zero
> execute breakpoint registers or a few, and may be one data breakpoint
> register or a few, but there is no single HB_NUM of how many breakpoint
> slots for any kind whatever.

I'll go through the file and see which parts really can be shared.  It
might end up being less than you think.

Note that doing this would necessarily create a bunch of new public 
symbols.  Routines that I now have declared static wouldn't be able to 
remain that way.

> > +	/* Block kernel breakpoint updates from other CPUs */
> > +	local_irq_save(flags);
> 
> I have a feeling this is more costly than we want, though I don't really
> know.  It seems to me that things in struct cpu_hw_breakpoint are not
> really per-CPU, except for bp_task.  They are "current global state",
> right?

Not really, since changes to the debug registers on multiple CPUs cannot
be made simultaneously.  There will be short periods when different CPUs
have different debug register values.  What if a debug exception occurs
during one of those periods?  Or what if a task switch occurs?

>  So I think it fits well to have the per_cpu struct contain just
> bp_task and a pointer to the current state struct, and use RCU to replace
> that struct when registrations change.  Perhaps rather than actually
> updating other CPU's pointers in the RCU fashion, that would just be done
> by each CPU for itself from the IPI handler that changes the hardware.
> Still, it just requires rcu_read_lock+rcu_dereference at context switch.

See how you like the new implementation in the next version (when it's
ready).  The local_irq_save can be avoided by extending the size of the
RCU critical section.

> > +	/* Mask in the parts of DR7 that refer to the new thread */
> > +	chbi->dr7 = chbi->kdr7 | (~chbi->kdr7_mask & thbi->tdr7);
> > +	set_debugreg(chbi->dr7, 7);
> 
> I'd say it's worth saving the loads here to recompute tdr7 with the
> kernel-used bits masked out and cache it that way, so the fast path 
> looks at only thbi->tdr7|chbi->kdr7.  

Doing it that way would require all the thbi->tdr7 values in all tasks
being debugged to be updated whenever the kdr7_mask value changes.  It's
not impossible, but it could have a high cost in terms of cacheline
contention on SMP systems.  Maybe you don't think that's a big issue.

For that matter, as long as you're going to update thbi->tdr7 with the
kernel bits masked out, why not go ahead and mask in the kernel's kdr7
bits as well?  Then no computation would be needed during task switches 
at all.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-28 21:39                                           ` Roland McGrath
  2007-03-29 21:35                                             ` Alan Stern
@ 2007-04-13 21:09                                             ` Alan Stern
  2007-05-11 15:25                                             ` Alan Stern
  2 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-04-13 21:09 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Roland:

Here's the latest take on the hw_breakpoint patch.  I adopted most of your
suggestions.  There still isn't a .bits member, but or'ing the .len and
.type members together will give you essentially the same thing; both of
those values are now completely encoded.

The hot path in switch_to_thread_hw_breakpoint() should now be very fast.  
There's a minimal amount of additional activity needed to deal with kernel
breakpoint updates that might arrive in the middle of a context switch.

I didn't try to split hw_breakpoint.c apart into sharable and non-sharable
pieces.  At this stage it's not entirely clear which routines would have
to go on each side.  For example, processors with separate sets of debug
registers for execute and data breakpoints would require a substantial
change to the existing code.  Probably all the lists and arrays would have
to be duplicated, with one copy for execute breakpoints and one for data
breakpoints.

If you eliminate all routines that refer to HB_NUM or dr7, that really 
doesn't leave much sharable code.  The routines which qualify tend to be 
relatively short; I think the largest one is flush_thread_hw_breakpoint().

It turns out that on some processors the CPU does reset DR6 sometimes.  
Intel's documentation is wonderfully vague: "Certain debug exceptions may
clear bits 0-3."  And it appears that gdb relies on this behavior; it
distinguishes correctly among multiple breakpoints on a vanilla kernel but
not under the previous version of hw_breakpoint.  I decided the safest
course was to have do_debug() clear tsk->thread.vdr6 whenever any of the
four breakpoint bits is set in the real DR6.  More sophisticated behavior 
would be possible at the cost of adding an extra flag to tsk->thread.

It also turns out that some CPUs don't automatically set the RF bit in 
the EFLAGS image on the stack.  Intel recommends that the OS always set 
that bit whenever a debug exception occurs, so that's what I did.

Finally, I put in a couple of #ifdef's to make the same source work under 
both i386 and x86_64, although I haven't tried building it.  You might 
want to check and make sure that part of validate_settings() is correct.

I trust we are moving closer to a final, usable form.

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,19 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#include <asm-generic/hw_breakpoint.h>
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1		0x40
+#define HW_BREAKPOINT_LEN_2		0x44
+#define HW_BREAKPOINT_LEN_4		0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE	0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* __KERNEL__ */
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -58,6 +58,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/pda.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -359,9 +360,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -379,15 +381,17 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -430,14 +434,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -467,7 +478,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -479,18 +491,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -540,16 +552,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -682,7 +684,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -714,6 +716,13 @@ struct task_struct fastcall * __switch_t
 
 	write_pda(pcurrent, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -592,13 +592,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -807,62 +807,49 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	unsigned long dr6;
 
-	get_debugreg(condition, 6);
+	get_debugreg(dr6, 6);
+	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
+	if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		tsk->thread.vdr6 = 0;
 
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	if (notify_die(DIE_DEBUG, "debug", regs, (long) &dr6, error_code,
+			SIGTRAP) == NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+		dr6 &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~X86_EFLAGS_TF;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 |= dr6;
 
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (dr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
+
+	/* Intel recommends always setting RF in the EFLAGS image */
+	regs->eflags |= X86_EFLAGS_RF;
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -61,4 +63,29 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+
+/*
+ * HW breakpoint additions
+ */
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+	set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -402,8 +402,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,1201 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	TF flag bit for single-step exceptions in kernel space?
+
+	Checks against TASK_SIZE for both i386 & x86_64?
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/kdebug.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	unsigned long		tkdr7;		/* Thread + kernel DR7 value */
+	int			num_installed;	/* Number of installed bps */
+	int			gennum;		/* tkdr7 generation number */
+
+	/* ptrace support -- Note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available.
+	 */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			representing virtualized debug registers 0 - 3 */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+	int			num_kbps;	/* Number of kernel bps */
+	unsigned long		mkdr7;		/* Masked kernel DR7 value */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+	u8			switching;	/* Task switch in progress */
+	u8			restart;	/* Restart the task switch */
+	u8			gennum;		/* Generation number */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Kernel-space breakpoint data */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static int			num_kbps;	/* Number of kernel bps */
+static unsigned long		kdr7;		/* Kernel DR7 value */
+
+static u8			gennum;		/* Generation number */
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1).  The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0003,	/* LEN0, R/W0, G0, L0 */
+	0x00ff000f,	/* Same for 0,1 */
+	0x0fff003f,	/* Same for 0,1,2 */
+	0xffff00ff	/* Same for 0,1,2,3 */
+};
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void switch_kernel_hw_breakpoint(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->num_kbps = num_kbps;
+	chbi->gennum = gennum;
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	i = 0;
+	list_for_each_entry_rcu(bp, &kernel_bps, node) {
+		if (i >= chbi->num_kbps)
+			break;
+		chbi->bps[i] = bp;
+		switch (i) {
+		case 0:
+			set_debugreg(bp->address.va, 0);
+			break;
+		case 1:
+			set_debugreg(bp->address.va, 1);
+			break;
+		case 2:
+			set_debugreg(bp->address.va, 2);
+			break;
+		case 3:
+			set_debugreg(bp->address.va, 3);
+			break;
+		}
+		++i;
+	}
+
+	chbi->mkdr7 = kdr7 & (kdr7_masks[chbi->num_kbps] | DR_GLOBAL_SLOWDOWN);
+	set_debugreg(chbi->mkdr7, 7);
+}
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+
+	/* This routine is on the hot path; it gets called for every
+	 * context switch into a task with active breakpoints.  We
+	 * must make sure that the common case executes as quickly as
+	 * possible.
+	 */
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this time, so we can't use the global value
+	 * stored in num_kbps.  Instead we'll use the per-CPU value.
+	 */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Use RCU to synchronize with external updates */
+	rcu_read_lock();
+ restart:
+	chbi->switching = 1;
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the kernel breakpoints have
+	 * changed or any user breakpoints have been registered or
+	 * unregistered, we need to handle the updates and possibly
+	 * send out some notifications.
+	 */
+	if (unlikely(thbi->gennum != chbi->gennum)) {
+		struct hw_breakpoint *bp;
+		int i;
+
+		thbi->gennum = chbi->gennum;
+		thbi->tkdr7 = chbi->mkdr7 |
+				(thbi->tdr7 & ~kdr7_masks[chbi->num_kbps]);
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		thbi->num_installed = 0;
+		i = HB_NUM;
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < chbi->num_kbps) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				++thbi->num_installed;
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+	}
+
+	/* Install the user breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up; there are num_kbps of them.
+	 * User breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for.
+	 */
+	switch (thbi->num_installed) {
+	case 4:
+		set_debugreg(thbi->tdr[0], 0);
+	case 3:
+		set_debugreg(thbi->tdr[1], 1);
+	case 2:
+		set_debugreg(thbi->tdr[2], 2);
+	case 1:
+		set_debugreg(thbi->tdr[3], 3);
+	}
+	set_debugreg(thbi->tkdr7, 7);
+
+	/* Were there any kernel breakpoint changes while we were running? */
+	chbi->switching = 0;
+	if (unlikely(chbi->restart)) {
+		chbi->restart = 0;
+		switch_kernel_hw_breakpoint(chbi);
+		goto restart;
+	}
+
+	rcu_read_unlock();
+	put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = NULL;
+
+	/* Use RCU to synchronize with external updates */
+	rcu_read_lock();
+ restart:
+	chbi->switching = 1;
+
+	set_debugreg(chbi->mkdr7, 7);
+
+	/* Were there any kernel breakpoint changes while we were running? */
+	chbi->switching = 0;
+	if (chbi->restart) {
+		chbi->restart = 0;
+		switch_kernel_hw_breakpoint(chbi);
+		goto restart;
+	}
+	rcu_read_unlock();
+	put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	/* If switch_to_thread_hw_breakpoint() is already running in a
+	 * higher stack frame (i.e., we interrupted it) then don't call it
+	 * recursively.  Just let it know that it has to update the kernel
+	 * breakpoints and restart.
+	 */
+	if (chbi->switching)
+		chbi->restart = 1;
+	else {
+		/* Install both the kernel and the user breakpoints */
+		switch_kernel_hw_breakpoint(chbi);
+		if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+			switch_to_thread_hw_breakpoint(tsk);
+	}
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	++gennum;
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+	synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	update_this_cpu(NULL);
+	local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.  Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio.
+	 */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps.
+	 */
+	k = 0;			/* Next kernel bp to allocate */
+	u = HB_NUM - 1;		/* Next user bp to allocate */
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k <= u) {
+		if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+			--u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != num_kbps) {
+		changed = 1;
+		num_kbps = k;
+	}
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled.
+	 */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		k = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (k++ >= num_kbps)
+				break;
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+			thbi->gennum = -1;
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE?
+	 */
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = thbi->vdr_bps[i].address.va;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+	unsigned long len;
+
+	switch (bp->type) {
+	case HW_BREAKPOINT_EXECUTE:
+		if (bp->len != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+	case HW_BREAKPOINT_WRITE:
+	case HW_BREAKPOINT_RW:
+		break;
+	default:
+		return rc;
+	}
+
+	switch (bp->len) {
+	case HW_BREAKPOINT_LEN_1:
+		len = 1;
+		break;
+	case HW_BREAKPOINT_LEN_2:
+		len = 2;
+		break;
+	case HW_BREAKPOINT_LEN_4:
+		len = 4;
+		break;
+#ifdef	CONFIG_X86_64
+	case HW_BREAKPOINT_LEN_8:
+		len = 8;
+		break;
+#endif
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len.
+	 */
+	if (bp->address.va & (len - 1))
+		return rc;
+
+	/* Check that the address is in the proper range */
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE_OF(t)		TASK_SIZE
+#define	TASK_SIZE64		TASK_SIZE
+#endif
+	if (tsk) {		/* User breakpoint */
+		if (bp->address.va >= TASK_SIZE_OF(tsk))
+			return rc;
+	} else {		/* Kernel breakpoint */
+		if (bp->address.va < TASK_SIZE64)
+			return rc;
+	}
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type)
+{
+	unsigned long temp;
+
+	temp = (len | type) & 0xf;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_SLOWDOWN;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct list_head *bp_list, int is_user)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later.
+	 */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		drnum = (is_user ? HB_NUM - 1 - i : i);
+		dr7 |= encode_dr7(drnum, bp->len, bp->type);
+		if (++i >= HB_NUM)
+			break;
+	}
+	return dr7;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = HB_NUM - 1;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		thbi->tdr[i] = bp->address.va;
+		if (--i < 0)
+			break;
+	}
+	while (i >= 0)
+		thbi->bps[i--] = NULL;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	bp->status = HW_BREAKPOINT_REGISTERED;
+	list_add_tail_rcu(&bp->node, &temp_bp->node);
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(&thbi->node)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	}
+	synchronize_rcu();
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid.
+	 */
+	list_del_rcu(&bp->node);
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+	synchronize_rcu();
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+	thbi->gennum = -1;	/* Send notifications */
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase.
+	 */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < HB_NUM - num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	thbi->tdr7 = calculate_dr7(&thbi->thread_bps, 1);
+	thbi->gennum = -1;	/* Send notifications */
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary.
+	 */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Tell all threads being debugged to send update notifications.
+ */
+static void notify_all_threads(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	list_for_each_entry(thbi, &thread_list, node)
+		thbi->gennum = -1;
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+	notify_all_threads();
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	kdr7 = calculate_dr7(&kernel_bps, 0);
+	notify_all_threads();
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that the breakpoint
+	 * was hit so the thread's debugger will see it, and send the
+	 * debugging signal.
+	 */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+		send_sigtrap(tsk, regs, 0);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address.va;
+	} else if (n == 6)
+		val = tsk->thread.vdr6;
+	else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+	*len = (temp & 0xc) | 0x40;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints, making the
+	 * appropriate changes to each.
+	 */
+ restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint before trying to change it */
+		if (bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's new settings */
+		bp->len = len;
+		bp->type = type;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will raise an error here.
+		 */
+		if (enabled) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	/* We have to hold this lock the entire time, to prevent thbi
+	 * from being deallocated out from under us.
+	 */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		/* If the breakpoint is registered then unregister it,
+		 * change it, and re-register it.  Revert to the original
+		 * address if an error occurs.
+		 */
+		if (bp->status) {
+			unsigned long old_addr = bp->address.va;
+
+			__unregister_user_hw_breakpoint(tsk, bp);
+			bp->address.va = val;
+			rc = __register_user_hw_breakpoint(tsk, bp);
+			if (rc < 0) {
+				bp->address.va = old_addr;
+				__register_user_hw_breakpoint(tsk, bp);
+			}
+		} else {
+			bp->address.va = val;
+			rc = 0;
+		}
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi = NULL;
+
+	/* A pointer to the DR6 value is stored in args->err */
+#define DR6	(* (unsigned long *) (args->err))
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch.
+		 */
+		switch_to_none_hw_breakpoint();
+	} else
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions.
+	 */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (likely(!(DR6 & (DR_TRAP0 << i))))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback.
+		 */
+		if (i < chbi->num_kbps)
+			bp = chbi->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp)			/* Should always be non-NULL */
+			(bp->triggered)(bp, args->regs);
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(thbi ? thbi->tkdr7 : chbi->mkdr7, 7);
+	put_cpu_no_resched();
+
+	/* Mask away the bits we have handled */
+	DR6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Early exit from the notifier chain if everything has been handled */
+	if (DR6 == 0)
+		return NOTIFY_STOP;
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -383,11 +383,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -417,59 +417,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -625,7 +577,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -45,6 +46,8 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	disable_debug_registers();
 }
 
 void save_processor_state(void)
@@ -69,20 +72,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -670,9 +670,18 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+
+	/* A pointer to the DR6 value is stored in args->err */
+#define DR6	(* (unsigned long *) (args->err))
+
+		if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
+			DR6 &= ~DR_STEP;
+			if (DR6 == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
+#undef DR6
+
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
 		/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,213 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.  @len encodes the
+ * breakpoint's extent in bytes, which is subject to certain limitations.
+ * include/asm/hw_breakpoint.h contains macros defining the available
+ * lengths for a specific architecture.  Note that @address must have the
+ * alignment specified by @len.  The breakpoint will catch accesses to
+ * any byte in the range from @address to @address + (N - 1), where N is
+ * the value encoded by @len.
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures.  Execute breakpoints
+ * must have @len equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * In register_user_hw_breakpoint(), @address must refer to a location in
+ * user space (set @address.user).  The breakpoint will be active only
+ * while the requested task is running.  Conversely in
+ * register_kernel_hw_breakpoint(), @address must refer to a location in
+ * kernel space (set @address.kernel), and the breakpoint will be active
+ * on all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception).  All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint().  If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.  (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address.kernel = &pid_max;
+ * 	my_bp.type = HW_BREAKPOINT_WRITE;
+ * 	my_bp.len = HW_BREAKPOINT_LEN_4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * 	rc = register_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	union {
+		const void		*kernel;
+		const void __user	*user;
+		unsigned long		va;
+	}		address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture.  On i386 the
+ * possibilities are:
+ *
+ *	HW_BREAKPOINT_LEN_1
+ *	HW_BREAKPOINT_LEN_2
+ *	HW_BREAKPOINT_LEN_4
+ *	HW_BREAKPOINT_LEN_EXECUTE
+ *	HW_BREAKPOINT_RW
+ *	HW_BREAKPOINT_READ
+ *	HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable.  You can use #ifdef
+ * to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* __KERNEL__ */
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/debugreg.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
+	disable_debug_registers();
 
 	control_page = page_address(image->control_code_page);
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -60,6 +60,7 @@
 #include <mach_wakecpu.h>
 #include <smpboot_hooks.h>
 #include <asm/vmi.h>
+#include <asm/debugreg.h>
 
 /* Set if we find a B stepping CPU */
 static int __devinitdata smp_b_stepping;
@@ -429,6 +430,7 @@ static void __cpuinit start_secondary(vo
 	local_irq_enable();
 
 	wmb();
+	load_debug_registers();
 	cpu_idle();
 }
 
@@ -1244,6 +1246,7 @@ int __cpu_disable(void)
 	fixup_irqs(map);
 	/* It's now safe to remove this processor from the online map */
 	cpu_clear(cpu, cpu_online_map);
+	disable_debug_registers();
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-03-28 21:39                                           ` Roland McGrath
  2007-03-29 21:35                                             ` Alan Stern
  2007-04-13 21:09                                             ` Alan Stern
@ 2007-05-11 15:25                                             ` Alan Stern
  2007-05-13 10:39                                               ` Roland McGrath
  2 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-05-11 15:25 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Wed, 28 Mar 2007, Roland McGrath wrote:

> Sorry I've been slow in responding to your most recent version.
> I fell into a large hole and couldn't get out until I fixed some bugs.

Has the same thing happened again?  There hasn't been any feedback on the 
most recent version of hw_breakpoint emailed on April 13:

	http://marc.info/?l=linux-kernel&m=117661223820357&w=2

I think there are probably still a few small things wrong with it.  For
instance, the RF setting isn't right; I misunderstood the Intel manual.  
It should get set only when the latest debug interrupt was for an
instruction breakpoint.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-11 15:25                                             ` Alan Stern
@ 2007-05-13 10:39                                               ` Roland McGrath
  2007-05-14 15:42                                                 ` Alan Stern
  2007-05-17 20:39                                                 ` Alan Stern
  0 siblings, 2 replies; 70+ messages in thread
From: Roland McGrath @ 2007-05-13 10:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

Sorry again about the delay.  

> I trust we are moving closer to a final, usable form.

Indeed, I think it is getting there.

> I think there are probably still a few small things wrong with it.  For
> instance, the RF setting isn't right; I misunderstood the Intel manual.  
> It should get set only when the latest debug interrupt was for an
> instruction breakpoint.

This makes me think about RF a little more.  If you ever set it, there are
some places we need to clear it too.  That is, when the PC is being changed
before returning to user mode, which is in signals and in ptrace.  If the
PC is changing to other than the breakpoint location hit by the handler
that set RF, we need clear RF so that the first instruction at the changed
PC can be a breakpoint hit of its own and not get masked.  In fact, it may
also be necessary to clear RF when freshly setting a new instruction
breakpoint (when RF is set because the stop was not a debug exception at
all), so that it isn't skipped if the PC happens to be right there already.

> Come to think of it, we don't really need modify_user_hw_breakpoint at
> all.  It could be replaced by an {unregister(old); register(new);}
> sequence.  Unless you think there's some pressing reason to keep it, my
> inclination is to do away with it.

I sort of wondered from the beginning why it was there.  The rationale I
can see is to avoid flutter.  That is, when unregistering frees up a slot
for a lower-priority allocation waiting in the wings, and then the new
registration will just displace it again.  The priority list diddling is
wasted work to get back to just how it was before, but more importantly you
don't want to have those callbacks for a momentarily-available slot coming
and going.  I don't know if this can really come up with the current code.

> Hmm...  Maybe I could store a pointer to the DR6 value in args.err instead
> of the value itself...

Ugh.

> As I understand it, setting one of those bits is necessary on the 386 but
> not necessary for later processors.  Should this be controlled by a
> runtime (or compile time) check?  For that matter, do those bits have any
> effect at all on a Pentium?

I've never heard of anyone using them, but I don't know the full story.

> My Intel manual says that the CPU automatically sets the RF bit in the
> EFLAGS image stored on the stack by the debug exception.  Hence the
> handler doesn't have to worry about it.  That's why I removed it from the 
> existing code.

The documentation I have says that RF is set in the trap frame on the stack
(i.e. pt_regs.eflags) by every other kind of exception.  However, for a
debug exception that is due to an instruction breakpoint, RF=0 in the trap
frame and the manual explicitly says that the handler must set the bit so
that iret will resume and execute it rather than hit the breakpoint again.

[later:]
> It also turns out that some CPUs don't automatically set the RF bit in 
> the EFLAGS image on the stack.  Intel recommends that the OS always set 
> that bit whenever a debug exception occurs, so that's what I did.

Is this really "some CPUs"?  Or is it actually always as I described above
(i.e. RF set usually but cleared for an instruction breakpoint hit)?

> If callers want to give up when a kernel breakpoint isn't installed 
> immediately, all they have to do is check the return value from 
> register_kernel_hw_breakpoint and call unregister_kernel_hw_breakpoint.  
> If you really want it, I could add an extra "fail if not installed" 
> argument flag.

The important thing is that there aren't any difficult races (i.e. what you
get with callbacks).  If register with no callback followed by unregister
on seeing "registered but not installed" return value is simple and cheap,
that is fine.

> For user breakpoints, the whole notion is almost meaningless.  Even if the
> breakpoint was allocated a debug register initially, it could get
> displaced by the time the debuggee task next runs.

It's no less meaningful than for a kernel allocation.  In neither case is
there a guarantee you'll keep it forever.  What callers I had in mind want
is a quick answer when the answer is negative at the time of the call, so
they just punt on the complexity of dealing with a positive answer.

> Again, this was referring to existing code which I basically copied 
> without fully understanding.  Does the new code in do_debug do the right 
> thing with regard to TF?

It looks right to me.  That is, it preserves the existing behavior for
kernel-mode traps, and does not touch TF at all for user-mode traps.

> > > +	/* Block kernel breakpoint updates from other CPUs */
> > > +	local_irq_save(flags);
> > 
> > I have a feeling this is more costly than we want, though I don't really
> > know.  It seems to me that things in struct cpu_hw_breakpoint are not
> > really per-CPU, except for bp_task.  They are "current global state",
> > right?
> 
> Not really, since changes to the debug registers on multiple CPUs cannot
> be made simultaneously.  There will be short periods when different CPUs
> have different debug register values.  What if a debug exception occurs
> during one of those periods?

I think it's fine if a CPU getting an exception before it's processed the
IPI looks at changed global state and says "oh, mine was stale", and punts
the hit.  (Or perhaps it transmorgifies its apparent DR# based on the new
global state, if the CPU's old setting corresponds to one of the new
settings.  Probably the changing of settings can just preserve the old DR#
selection in such cases and simplify the situation for the handler doing
the catch-up to just if (old->dr[n] != new->dr[n]) ignore;.)

> Or what if a task switch occurs?

You mean a context switch before the IPI gets in?
switch_to_thread_hw_breakpoint can just install the latest global state.

> Here's the latest take on the hw_breakpoint patch.  I adopted most of your
> suggestions.  There still isn't a .bits member, but or'ing the .len and
> .type members together will give you essentially the same thing; both of
> those values are now completely encoded.

I'd still prefer to have a single machine-dependent field and not have .len.

> The hot path in switch_to_thread_hw_breakpoint() should now be very fast.  
> There's a minimal amount of additional activity needed to deal with kernel
> breakpoint updates that might arrive in the middle of a context switch.

It looks promising.  

I'm not entirely sanguine about an 8-bit gennum.  For the kernel
settings, it's going to be fine--there won't be 256 updates before all
the CPUs process their IPIs.  But for the thbi->gennum comparison, a
thread might very well not have run for days, while there have been
many more updates than that, and its gennum%256 matching the current
one or not is just luck.  

You may need some memory barriers around the switching/restart stuff.
In fact, I think it would be better not to delve into reinventing the
low-level bits there at all.  Instead use read_seqcount_retry there
(linux/seqlock.h).  Using that read_seqcount_begin's value as the
number to compare in thbi would also give a 32-bit sequence number.

I don't see why notify_all_threads ever needs to be used.  The sequence
number changed, so the next switch in will always update.  I guess
that's how you were avoiding the untrustworthy 8-bit sequence number
issue.  But I think it's better to do the whole thing with seqcount and
rely on 32-bit sequence numbers being good enough to let thread updates
be entirely lazy.

> I'll go through the file and see which parts really can be shared.  It
> might end up being less than you think.
> 
> Note that doing this would necessarily create a bunch of new public 
> symbols.  Routines that I now have declared static wouldn't be able to 
> remain that way.
[later:]
> I didn't try to split hw_breakpoint.c apart into sharable and non-sharable
> pieces.  At this stage it's not entirely clear which routines would have
> to go on each side.  For example, processors with separate sets of debug
> registers for execute and data breakpoints would require a substantial
> change to the existing code.  Probably all the lists and arrays would have
> to be duplicated, with one copy for execute breakpoints and one for data
> breakpoints.
>
> If you eliminate all routines that refer to HB_NUM or dr7, that really 
> doesn't leave much sharable code.  The routines which qualify tend to be 
> relatively short; I think the largest one is flush_thread_hw_breakpoint().

It looks to me like there is quite a lot to be shared.  Of course the
code can refer to constants like HB_NUM, they just have to be defined
per machine.  The dr7 stuff can all be a couple of simple arch_foo
hooks, which will be empty on other machines.  All of the list-managing
logic, the prio stuff, etc., would be bad to copy.

The two flavors could probably be accomodated cleanly with an
HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
around some of the calls.  I'm not suggesting you try to figure out
that code structure ahead of time.  But I don't think it will be a big
barrier to code sharing.

> It turns out that on some processors the CPU does reset DR6 sometimes.  
> Intel's documentation is wonderfully vague: "Certain debug exceptions may
> clear bits 0-3."  And it appears that gdb relies on this behavior; it
> distinguishes correctly among multiple breakpoints on a vanilla kernel but
> not under the previous version of hw_breakpoint.

So it sounds like maybe the real behavior is that any dr[0-3]-induced
exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
other bits (i.e. just DR_STEP in practice).  Is that part true on all CPUs?

> I decided the safest course was to have do_debug() clear tsk->thread.vdr6
> whenever any of the four breakpoint bits is set in the real DR6.  More
> sophisticated behavior would be possible at the cost of adding an extra
> flag to tsk->thread.

I'm not sure what you have in mind using a new thread flag.  To be
consistent with existing (and machine) behavior, shouldn't that be clear
only all the low (DR_TRAP[0-3]) bits when one of those bits is set?

> Finally, I put in a couple of #ifdef's to make the same source work under 
> both i386 and x86_64, although I haven't tried building it.  You might 
> want to check and make sure that part of validate_settings() is correct.

That looks fine.

I'd like to see this concretely working on x86_64 as well as i386.
That should be a simple matter of the new header file and the makefile
patches to share the code.  I can test on x86_64 if you can't.

Do you have some simple test cases prepared?  That is, some simple
modules using the generic kernel hw_breakpoint support to readily
report working or not working on basic functionality.  I'd like to have
something we can agree on as the baseline smoke test for trying the
patches, and for new machine ports.

I also want to get this machine-independent code sharing going for
real.  I'd like to have powerpc working as the non-x86 demonstration
before we declare things in good shape.  I don't expect you to write
any powerpc support, but I hope I can get you to do the arch code
separation to make the way for it.  If you'll take a crack at it, I'll
fill in and test the powerpc bits and I think we'll get something very
satisfactory ironed out pretty fast.  

So consider the powerpc64 situation and imagine how you would do the
implementation for it, and I think you'll find a lot of the code you've
written is naturally shared for it.  It's a bit of a degenerate case,
because HB_NUM is 1, but that needn't really matter.  There are only
data address breakpoints of length 8 with an aligned address, so the
only control info aside from the address is r/w bits.  There is no
separate control register.  The control bits are stored in the low bits
of the register whose high bits are the high bits of the aligned
address.  (I think other machines store their control bits the same
way.)  So in fact, not only is there no need for .len, but .type is
actually just bits that could be stored directly in address.va (if
noone expected to look at that for the address, or they used an
accessor that masks off the low bits).  But there are bits to spare
there next to .priority, so keeping them separate doesn't hurt.  What's
important is that the chbi->dabr and thbi->dabr fields are stored in
fully-encoded form for quick switching.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-13 10:39                                               ` Roland McGrath
@ 2007-05-14 15:42                                                 ` Alan Stern
  2007-05-14 21:25                                                   ` Roland McGrath
  2007-05-17 20:39                                                 ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-05-14 15:42 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Sun, 13 May 2007, Roland McGrath wrote:

> This makes me think about RF a little more.  If you ever set it, there are
> some places we need to clear it too.  That is, when the PC is being changed
> before returning to user mode, which is in signals and in ptrace.  If the
> PC is changing to other than the breakpoint location hit by the handler
> that set RF, we need clear RF so that the first instruction at the changed
> PC can be a breakpoint hit of its own and not get masked.  In fact, it may
> also be necessary to clear RF when freshly setting a new instruction
> breakpoint (when RF is set because the stop was not a debug exception at
> all), so that it isn't skipped if the PC happens to be right there already.

It seems to me that signal handlers must run with a copy of the original
EFLAGS stored on the stack.  Otherwise, when the handler returned the
former context wouldn't be fully restored.  But I don't know enough about
the signal handling code to see how to turn off RF in the stored EFLAGS
image.

Also, what if the signal handler was entered as a result of encountering 
an instruction breakpoint?  In that case you would want to keep RF on to 
prevent an infinite loop.

You're right about wanting to clear RF when changing the PC via ptrace or
when setting a new execution breakpoint (provided the new breakpoint's
address is equal to the current PC value).

Do you know how gdb handles instruction breakpoints, and in particular, 
how it resumes execution after a breakpoint?

> > Come to think of it, we don't really need modify_user_hw_breakpoint at
> > all.  It could be replaced by an {unregister(old); register(new);}
> > sequence.  Unless you think there's some pressing reason to keep it, my
> > inclination is to do away with it.
> 
> I sort of wondered from the beginning why it was there.  The rationale I
> can see is to avoid flutter.  That is, when unregistering frees up a slot
> for a lower-priority allocation waiting in the wings, and then the new
> registration will just displace it again.  The priority list diddling is
> wasted work to get back to just how it was before, but more importantly you
> don't want to have those callbacks for a momentarily-available slot coming
> and going.  I don't know if this can really come up with the current code.

That may be what I originally had in mind; I no longer remember.

But it doesn't matter.  We're up against an API incompatibility here.  
gdb doesn't allow you to modify breakpoints; it forces you to delete the
old one and add a new one.  It's only an artifact of the x86 architecture
that gdb implements this by reusing debug registers.  So even if the 
modify_user_hw_breakpoint() routine were kept, gdb wouldn't really want to 
make use of it.

Under the circumstances I think we should just leave it out.

> > As I understand it, setting one of those bits is necessary on the 386 but
> > not necessary for later processors.  Should this be controlled by a
> > runtime (or compile time) check?  For that matter, do those bits have any
> > effect at all on a Pentium?
> 
> I've never heard of anyone using them, but I don't know the full story.

On the 386, either GE or LE had to be set for DR breakpoints to work 
properly.  Later on (I don't remember if it was in the 486 or the Pentium) 
this restriction was removed.  I don't know whether those bits do anything 
at all on modern CPUs.

> The documentation I have says that RF is set in the trap frame on the stack
> (i.e. pt_regs.eflags) by every other kind of exception.  However, for a
> debug exception that is due to an instruction breakpoint, RF=0 in the trap
> frame and the manual explicitly says that the handler must set the bit so
> that iret will resume and execute it rather than hit the breakpoint again.
> 
> [later:]
> > It also turns out that some CPUs don't automatically set the RF bit in 
> > the EFLAGS image on the stack.  Intel recommends that the OS always set 
> > that bit whenever a debug exception occurs, so that's what I did.
> 
> Is this really "some CPUs"?  Or is it actually always as I described above
> (i.e. RF set usually but cleared for an instruction breakpoint hit)?

My 80386 Programmer's Reference Manual says:

	... an instruction-address breakpoint exception is a fault.

And:

	When it detects a fault, the processor automatically sets
	RF in the flags image that it pushes onto the stack.

And:

	The processor automatically sets RF in the EFLAGS image
	on the stack before entry into any fault handler.  Upon
	entry into the fault handler for instruction address
	breakpoints, for example, RF is set in the EFLAGS image
	on the stack...

That seems to be pretty clear.  So the behavior can vary according to the 
processor type.

> > If callers want to give up when a kernel breakpoint isn't installed 
> > immediately, all they have to do is check the return value from 
> > register_kernel_hw_breakpoint and call unregister_kernel_hw_breakpoint.  
> > If you really want it, I could add an extra "fail if not installed" 
> > argument flag.
> 
> The important thing is that there aren't any difficult races (i.e. what you
> get with callbacks).  If register with no callback followed by unregister
> on seeing "registered but not installed" return value is simple and cheap,
> that is fine.

I suppose you might register a breakpoint and find that it isn't installed 
immediately, but then it could get installed and actually trigger before 
you managed to unregister it.  Does that count as a "difficult race"?  
Presumably the work done by the trigger callback would get ignored.

> > > > +	/* Block kernel breakpoint updates from other CPUs */
> > > > +	local_irq_save(flags);
> > > 
> > > I have a feeling this is more costly than we want, though I don't really
> > > know.  It seems to me that things in struct cpu_hw_breakpoint are not
> > > really per-CPU, except for bp_task.  They are "current global state",
> > > right?
> > 
> > Not really, since changes to the debug registers on multiple CPUs cannot
> > be made simultaneously.  There will be short periods when different CPUs
> > have different debug register values.  What if a debug exception occurs
> > during one of those periods?
> 
> I think it's fine if a CPU getting an exception before it's processed the
> IPI looks at changed global state and says "oh, mine was stale", and punts
> the hit.  (Or perhaps it transmorgifies its apparent DR# based on the new
> global state, if the CPU's old setting corresponds to one of the new
> settings.  Probably the changing of settings can just preserve the old DR#
> selection in such cases and simplify the situation for the handler doing
> the catch-up to just if (old->dr[n] != new->dr[n]) ignore;.)

Punting isn't acceptable, not if the bp in question was present both 
before and after the IPI.  I'd rather transmogrify it as you described, 
awkward though that may be.

Maybe it doesn't have to be so bad.  If there were _two_ global copies of
the kernel bp settings, one for the old pre-IPI state and one for the new,
then the handler could simply look up the DR# in the appropriate copy.  
This would remove the need to store the settings in the per-CPU area.

> > Here's the latest take on the hw_breakpoint patch.  I adopted most of your
> > suggestions.  There still isn't a .bits member, but or'ing the .len and
> > .type members together will give you essentially the same thing; both of
> > those values are now completely encoded.
> 
> I'd still prefer to have a single machine-dependent field and not have .len.

It's a relatively minor issue.  On machines with fixed-length breakpoints, 
the .len field can be ignored.  Conversely, leaving it out would require 
using bitmasks to extract the type and length values from a combined .bits 
field.  I don't see any advantage.

> I'm not entirely sanguine about an 8-bit gennum.  For the kernel
> settings, it's going to be fine--there won't be 256 updates before all
> the CPUs process their IPIs.  But for the thbi->gennum comparison, a
> thread might very well not have run for days, while there have been
> many more updates than that, and its gennum%256 matching the current
> one or not is just luck.  

Ah, you haven't understood the purpose of the gennum.  In fact 8 bits 
isn't too small -- far from it!  It's too _large_; a single bit would 
suffice.  I made it an 8-bit value just because that was easier.

Here's the idea.  thbi->gennum is at all times either equal to the current 
gennum value or is set to -1.  That's what notify_all_threads() does; it 
sets thbi->gennum to -1 in all tasks currently being debugged whenever a 
change to the kernel breakpoints occurs.  My assumption is that almost all 
of the time there will be very few debuggees.

The main use of gennum is with chbi->gennum, which is at all times equal
to the current gennum value or the previous one (if the CPU hasn't yet
received the update IPI).  Hence chbi->gennum needs to distinguish between
only two values: current or previous.

Note that CPUs can never lag behind by more than one update.  The 
hw_breakpoint_mutex doesn't get released until every CPU has acknowledged 
receipt of the IPI.

> You may need some memory barriers around the switching/restart stuff.
> In fact, I think it would be better not to delve into reinventing the
> low-level bits there at all.  Instead use read_seqcount_retry there
> (linux/seqlock.h).  Using that read_seqcount_begin's value as the
> number to compare in thbi would also give a 32-bit sequence number.
> 
> I don't see why notify_all_threads ever needs to be used.  The sequence
> number changed, so the next switch in will always update.  I guess
> that's how you were avoiding the untrustworthy 8-bit sequence number
> issue.  But I think it's better to do the whole thing with seqcount and
> rely on 32-bit sequence numbers being good enough to let thread updates
> be entirely lazy.

Yes, that was the idea.  However seqcounts may work better in conjunction
with this idea of keeping a global copy of both the old and the new kernel
breakpoints.  I'll look into it.

> It looks to me like there is quite a lot to be shared.  Of course the
> code can refer to constants like HB_NUM, they just have to be defined
> per machine.  The dr7 stuff can all be a couple of simple arch_foo
> hooks, which will be empty on other machines.  All of the list-managing
> logic, the prio stuff, etc., would be bad to copy.
> 
> The two flavors could probably be accomodated cleanly with an
> HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
> around some of the calls.  I'm not suggesting you try to figure out
> that code structure ahead of time.  But I don't think it will be a big
> barrier to code sharing.

Hmmm, maybe.  Those loops would end up looking messy.

> > It turns out that on some processors the CPU does reset DR6 sometimes.  
> > Intel's documentation is wonderfully vague: "Certain debug exceptions may
> > clear bits 0-3."  And it appears that gdb relies on this behavior; it
> > distinguishes correctly among multiple breakpoints on a vanilla kernel but
> > not under the previous version of hw_breakpoint.
> 
> So it sounds like maybe the real behavior is that any dr[0-3]-induced
> exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
> other bits (i.e. just DR_STEP in practice).  Is that part true on all CPUs?

No.  The 80386 manual says:

	Note that the bits of DR6 are never cleared by the processor.

It's important to bear in mind that not all x86 CPUs are made by Intel, 
and of those that are, not all are Pentium 4's.  This appears to be an 
area of high variability so we should be as conservative as possible.

> > I decided the safest course was to have do_debug() clear tsk->thread.vdr6
> > whenever any of the four breakpoint bits is set in the real DR6.  More
> > sophisticated behavior would be possible at the cost of adding an extra
> > flag to tsk->thread.
> 
> I'm not sure what you have in mind using a new thread flag.  To be
> consistent with existing (and machine) behavior, shouldn't that be clear
> only all the low (DR_TRAP[0-3]) bits when one of those bits is set?

I could do that.  I don't know what happens to DR_STEP; a quick test might 
be worthwhile.

> I'd like to see this concretely working on x86_64 as well as i386.
> That should be a simple matter of the new header file and the makefile
> patches to share the code.  I can test on x86_64 if you can't.
> 
> Do you have some simple test cases prepared?  That is, some simple
> modules using the generic kernel hw_breakpoint support to readily
> report working or not working on basic functionality.  I'd like to have
> something we can agree on as the baseline smoke test for trying the
> patches, and for new machine ports.

I'll put together a simple test module for kernel breakpoints.  It's 
already possible to test user breakpoints just by running gdb.

> I also want to get this machine-independent code sharing going for
> real.  I'd like to have powerpc working as the non-x86 demonstration
> before we declare things in good shape.  I don't expect you to write
> any powerpc support, but I hope I can get you to do the arch code
> separation to make the way for it.  If you'll take a crack at it, I'll
> fill in and test the powerpc bits and I think we'll get something very
> satisfactory ironed out pretty fast.  
> 
> So consider the powerpc64 situation and imagine how you would do the
> implementation for it, and I think you'll find a lot of the code you've
> written is naturally shared for it.  It's a bit of a degenerate case,
> because HB_NUM is 1, but that needn't really matter.  There are only
> data address breakpoints of length 8 with an aligned address, so the
> only control info aside from the address is r/w bits.  There is no
> separate control register.  The control bits are stored in the low bits
> of the register whose high bits are the high bits of the aligned
> address.  (I think other machines store their control bits the same
> way.)  So in fact, not only is there no need for .len, but .type is
> actually just bits that could be stored directly in address.va (if
> noone expected to look at that for the address, or they used an
> accessor that masks off the low bits).  But there are bits to spare
> there next to .priority, so keeping them separate doesn't hurt.  What's
> important is that the chbi->dabr and thbi->dabr fields are stored in
> fully-encoded form for quick switching.

I'll see what I can do.

In this situation you don't need to worry about how .type and .len are 
stored.  On powerpc64 we can have a special thbi->dabr field analogous to 
the thbi->tdr7 field on x86.  All precomputed and ready for quick 
switching.

Even if HB_NUM were larger than 1, we could still store two copies of the 
address value (the second copy with the low-order type bits set).

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-14 15:42                                                 ` Alan Stern
@ 2007-05-14 21:25                                                   ` Roland McGrath
  2007-05-16 19:03                                                     ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-05-14 21:25 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> It seems to me that signal handlers must run with a copy of the original
> EFLAGS stored on the stack.

Of course.  I'm talking about how the registers get changed to set up the
signal handler to start running, not how the interrupted registers are
saved on the user stack.  There is no issue with the stored eflags image;
the "privileged" flags like RF are ignored by sigreturn anyway.

> Also, what if the signal handler was entered as a result of encountering 
> an instruction breakpoint?  

This does not happen in reality.  Breakpoints can only be set by the
debugger, not by the program itself.  The debugger should always eat the trap.

> You're right about wanting to clear RF when changing the PC via ptrace or
> when setting a new execution breakpoint (provided the new breakpoint's
> address is equal to the current PC value).

Starting a signal handler is "warping the PC" equivalent to changing it via
ptrace for purposes of this discussion.  In case the new PC is the site of
another breakpoint, RF must be clear.

> Do you know how gdb handles instruction breakpoints, and in particular, 
> how it resumes execution after a breakpoint?

AFAICT it never actually uses hardware instruction breakpoints, only data
watchpoints.  I wouldn't be surprised if noone has ever really used
instruction breakpoint settings in x86 hardware debug registers on Linux.
(Frankly, I don't much expect them to start either.  This level of detail
about instruction breakpoints is largely academic.  I am a stickler for
getting the details right if we're going to allow using them at all.  
But I think really everyone only cares about data watchpoints.)

> But it doesn't matter.  We're up against an API incompatibility here.  

That's a red herring.  gdb is the compatibility case, not the real API user.

> Under the circumstances I think we should just leave it out.

That is fine.  If the flutter issue comes up, we can address it later.

> On the 386, either GE or LE had to be set for DR breakpoints to work 
> properly.  Later on (I don't remember if it was in the 486 or the Pentium) 
> this restriction was removed.  I don't know whether those bits do anything 
> at all on modern CPUs.

I'm moderately sure they do nothing on modern CPUs.  Intel says they're
ignored as of Pentium, but recommends setting both bits if you care at all.
In practice, I don't think we'll ever hear about the inexactness on a
pre-Pentium processor from not setting the bits.  But I'd follow the Intel
manual and set both.

> My 80386 Programmer's Reference Manual says:

The earlier quote I gave was from an AMD64 manual.  A 1995 Intel manual I
have says, "All Intel Architecture processors manage the RF flag as follows,"
and proceeds to give the "all faults except instruction breakpoint" behavior
I quoted from the AMD manual earlier.  Hence I sincerely doubt that this
varies among Intel and AMD processors.  Someone else will have to help us
know about other makers' processors.  So far I have no reason to suspect that
any processor behaves differently (aside from generic cynicism ;-).

> I suppose you might register a breakpoint and find that it isn't installed 
> immediately, but then it could get installed and actually trigger before 
> you managed to unregister it.  Does that count as a "difficult race"?  

Yes, that is really the kind of thing I had in mind.  For user breakpoints it
shouldn't be an issue, since the thread shouldn't have been let run in between.

> Presumably the work done by the trigger callback would get ignored.

That is in the "difficult race" category to ensure.  I would not presume.

> Maybe it doesn't have to be so bad.  If there were _two_ global copies of
> the kernel bp settings, one for the old pre-IPI state and one for the new,
> then the handler could simply look up the DR# in the appropriate copy.  
> This would remove the need to store the settings in the per-CPU area.

I think that is what I suggested an iteration or two ago.  Installing new
state means making a fresh data structure and installing a pointer to it,
leaving the old (immutable) one to be freed by RCU.

> It's a relatively minor issue.  On machines with fixed-length breakpoints, 
> the .len field can be ignored.  Conversely, leaving it out would require 
> using bitmasks to extract the type and length values from a combined .bits 
> field.  I don't see any advantage.

I guess my main objection to having .type and .len is the false implied
documentation of their presence and names, leading to people thinking they
can look at those values.  In fact, they are machine-specific and
implementation-specific bits of no intrinsic use to anyone else.

> Ah, you haven't understood the purpose of the gennum.  In fact 8 bits 
> isn't too small -- far from it!  It's too _large_; a single bit would 
> suffice.  I made it an 8-bit value just because that was easier.

If it's actually a flag, then treating it any other way is just confusing.
I can't see how it's easier for anyone.

> Note that CPUs can never lag behind by more than one update.  The 
> hw_breakpoint_mutex doesn't get released until every CPU has acknowledged 
> receipt of the IPI.

Then it really is just a flag for all uses, and there's no reason at all to
call it a number.

> Yes, that was the idea.  However seqcounts may work better in conjunction
> with this idea of keeping a global copy of both the old and the new kernel
> breakpoints.  I'll look into it.

I think that is going to be the clean and sane approach.
Hand-rolling your low-level synchronization code is always questionable.

> > So it sounds like maybe the real behavior is that any dr[0-3]-induced
> > exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
> > other bits (i.e. just DR_STEP in practice).  Is that part true on all CPUs?
> 
> No.  The 80386 manual says:
> 
> 	Note that the bits of DR6 are never cleared by the processor.
> 
> It's important to bear in mind that not all x86 CPUs are made by Intel, 
> and of those that are, not all are Pentium 4's.  This appears to be an 
> area of high variability so we should be as conservative as possible.

That line from the manual is what we were both going on originally, and
then you described the conflicting behavior.  I was trying to ascertain
whether chips really do vary, or if the manual was just inaccurate about
the single common way it actually behaves.  I take it you have in fact
observed different behaviors on different chips?

There are two possible kinds of "conservative" here.  To be conservative
with respect to the existing behavior on a given chip, whatever that may
be, we should never clear %dr6 completely, and instead should always
mirror its bits to vdr7, only mapping the low four bits around to present
the virtualized order.  The only bits we'd ever clear in hardware are
those DR_TRAPn bits corresponding to the registers allocated to non-ptrace
uses, and kprobes should clear DR_STEP.  And note that when vdr6 is
changed by ptrace, we should reset the hardware %dr6 accordingly, to match
existing kernel behavior should users change debugreg[6] via ptrace.

To be conservative in the sense of reliable user-level behavior despite
chip oddities would be a little different.  Firstly, I think we should
mirror all the "extra" bits from hardware to vdr7 blindly, i.e. everything
but DR_STEP and DR_TRAPn.  That way if any chip comes along that sets new
bits for new features or whatnot, users can at least see the new hardware
bits via ptrace before hw_breakpoint gets updated to support them more
directly.  For the low four bits, I think what users expect is that no
bits are ever implicitly cleared, so they accumulate to say which drN has
hit since the last time ptrace was used to clear vdr6.

> Even if HB_NUM were larger than 1, we could still store two copies of the 
> address value (the second copy with the low-order type bits set).

There's no reason to waste another word when you only need two bits and
already have spare space for a machine implementation field (i.e. where .type
is now).

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-14 21:25                                                   ` Roland McGrath
@ 2007-05-16 19:03                                                     ` Alan Stern
  2007-05-23  8:47                                                       ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-05-16 19:03 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Mon, 14 May 2007, Roland McGrath wrote:

> > It seems to me that signal handlers must run with a copy of the original
> > EFLAGS stored on the stack.
> 
> Of course.  I'm talking about how the registers get changed to set up the
> signal handler to start running, not how the interrupted registers are
> saved on the user stack.  There is no issue with the stored eflags image;
> the "privileged" flags like RF are ignored by sigreturn anyway.

Ah, okay.  Yes, clearly the new EFLAGS for the signal handler should 
have RF turned off.  This should always be true, regardless of 
debugging.

> > Also, what if the signal handler was entered as a result of encountering 
> > an instruction breakpoint?  
> 
> This does not happen in reality.  Breakpoints can only be set by the
> debugger, not by the program itself.  The debugger should always eat the trap.

Hmmm.  I put in a little extra code to account for the possibility that
a program might want to set hardware breakpoints in itself.  Should
this be removed?

> The earlier quote I gave was from an AMD64 manual.  A 1995 Intel manual I
> have says, "All Intel Architecture processors manage the RF flag as follows,"
> and proceeds to give the "all faults except instruction breakpoint" behavior
> I quoted from the AMD manual earlier.  Hence I sincerely doubt that this
> varies among Intel and AMD processors.  Someone else will have to help us
> know about other makers' processors.  So far I have no reason to suspect that
> any processor behaves differently (aside from generic cynicism ;-).

And I no longer have any 386 CPUs to test...

> > It's a relatively minor issue.  On machines with fixed-length breakpoints, 
> > the .len field can be ignored.  Conversely, leaving it out would require 
> > using bitmasks to extract the type and length values from a combined .bits 
> > field.  I don't see any advantage.
> 
> I guess my main objection to having .type and .len is the false implied
> documentation of their presence and names, leading to people thinking they
> can look at those values.  In fact, they are machine-specific and
> implementation-specific bits of no intrinsic use to anyone else.

The fact that they are machine-specific and implementation-specific 
doesn't necessarily make them of no use.  See the driver below.

> That line from the manual is what we were both going on originally, and
> then you described the conflicting behavior.  I was trying to ascertain
> whether chips really do vary, or if the manual was just inaccurate about
> the single common way it actually behaves.  I take it you have in fact
> observed different behaviors on different chips?

No; I have tested only a couple of systems and I don't have a wide
variety of machines available.

> There are two possible kinds of "conservative" here.  To be conservative
> with respect to the existing behavior on a given chip, whatever that may
> be, we should never clear %dr6 completely, and instead should always
> mirror its bits to vdr7, only mapping the low four bits around to present
> the virtualized order.  The only bits we'd ever clear in hardware are
> those DR_TRAPn bits corresponding to the registers allocated to non-ptrace
> uses, and kprobes should clear DR_STEP.  And note that when vdr6 is
> changed by ptrace, we should reset the hardware %dr6 accordingly, to match
> existing kernel behavior should users change debugreg[6] via ptrace.
> 
> To be conservative in the sense of reliable user-level behavior despite
> chip oddities would be a little different.  Firstly, I think we should
> mirror all the "extra" bits from hardware to vdr7 blindly, i.e. everything
> but DR_STEP and DR_TRAPn.  That way if any chip comes along that sets new
> bits for new features or whatnot, users can at least see the new hardware
> bits via ptrace before hw_breakpoint gets updated to support them more
> directly.  For the low four bits, I think what users expect is that no
> bits are ever implicitly cleared, so they accumulate to say which drN has
> hit since the last time ptrace was used to clear vdr6.

Allow me to rephrase: When a debug exception occurs, the real DR6 value
should be copied to vdr6, except that kprobes should adjust DR_STEP and
hw_breakpoint should adjust the DR_TRAPn bits appropriately.  There's
some question about what value the debug exception handler should write
back to DR6, if anything.  When switching to a new task, the DR_TRAPn
bits in vdr6 could be de-virtualized somehow and the result loaded into
DR6, but again, it might be safest to leave DR6 alone.

As for what users expect of the low four bits, you are definitely 
wrong.  My tests with gdb show that it relies on the CPU to clear those 
bits whenever a data breakpoint is hit; it doesn't clear them itself 
and it doesn't work properly if the kernel keeps virtualized versions 
of them set.  That's on a Pentium 4 and on an AMD Duron.

I did some testing to see how the CPU behaves when the debug handler
writes different values back to DR6.  The results were:

	Values written back to DR6 were retained in the register until 
	the next debug exception occurred.

	When the exception handler read DR6, the 0xffff0ff0 bits were
	set every time.  The 0x00001000 bit was never set, even if it
	had been turned on before the exception occurred.

	No matter what values were stored in the low four bits
	beforehand, when the exception occurred DR6 had only the
	bit for the debug register which was triggered.

	If the handler wrote back any of BS, BT, or BD to DR6, then
	the system misbehaved.  I don't know exactly what happened,
	but my shell process ended and the debug handler got called
	over and over again (as if stuck in a loop) for several
	seconds.

In light of these results, the best approach appears to be either to 
leave DR6 alone or to set it to 0.


Below is a patch containing a driver meant for testing kernel hardware 
breakpoints.  Instructions are in the comments at the top.  You can 
build the driver by typing "make M=bptest" at the top level.

The patch also adjust the Alt-SysRq-P handler to print out the debug 
register values along with all the other stuff.

Alan Stern



Index: usb-2.6/bptest/Makefile
===================================================================
--- /dev/null
+++ usb-2.6/bptest/Makefile
@@ -0,0 +1 @@
+obj-m	+= bptest.o
Index: usb-2.6/bptest/bptest.c
===================================================================
--- /dev/null
+++ usb-2.6/bptest/bptest.c
@@ -0,0 +1,459 @@
+/*
+ * Test driver for hardware breakpoints.
+ *
+ * Copyright (C) 2007 Alan Stern <stern@rowland.harvard.edu>
+ */
+
+/*
+ * When this driver is loaded, it will create several attribute files
+ * under /sys/bus/platform/drivers/bptest:
+ *
+ *	 call, read, write, and bp0,..., bp3.
+ *
+ * It also allocates a 32-byte array (called "bytes") for testing data
+ * breakpoints, and it contains four do-nothing routines, r0(),..., r3(),
+ * for testing execution breakpoints.
+ *
+ * Writing to the "call" attribute causes the rN routines to be called;
+ * "echo >call N" will call rN(), where N is 0, 1, 2, or 3.  Similarly,
+ * "echo >call" will call all four routines.
+ *
+ * The byte array can be accessed through the "read" and "write"
+ * attributes.  "echo >read N" will read bytes[N], and "echo >write N V"
+ * will store V in bytes[N], where N is between 0 and 31.  There are
+ * no provision for multi-byte accesses; they shouldn't be needed for
+ * simple testing.
+ *
+ * The driver contains four hw_breakpoint structures, which can be
+ * accessed through the "bpN" attributes.  Reading the attribute file
+ * will yield the hw_breakpoint's current settings.  The settings can be
+ * altered by writing the attribute.  The format to use is:
+ *
+ *	echo >bpN priority type address [len]
+ *
+ * priority must be a number between 0 and 255.  type must be one of 'e'
+ * (execution), 'r' (read), 'w' (write), or 'b' (both read/write).
+ * address must be a number between 0 and 31; if type is 'e' then address
+ * must be between 0 and 3.  len must 1, 2, 4, or 8, but if type is 'e'
+ * then len is optional and ignored.
+ *
+ * Execution breakpoints are set on the rN routine and data breakpoints
+ * are set on bytes[N], where N is the address value.  You can unregister
+ * a breakpoint by doing "echo >bpN u", where 'u' is any non-digit.
+ *
+ * (Note: On i386 certain values are not implemented.  len cannot be set
+ * to 8 and type cannot be set to 'r'.)
+ *
+ * The driver prints lots of information to the system log as it runs.
+ * To best see things as they happen, use a VT console and set the
+ * logging level high (I use Alt-SysRq-9).
+ */
+
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <asm/hw_breakpoint.h>
+
+MODULE_AUTHOR("Alan Stern <stern@rowland.harvard.edu>");
+MODULE_DESCRIPTION("Hardware Breakpoint test driver");
+MODULE_LICENSE("GPL");
+
+
+static struct hw_breakpoint bps[4];
+
+
+#define NUM_BYTES	32
+static unsigned char bytes[NUM_BYTES] __attribute__((aligned(8)));
+
+/* Write n to read bytes[n] */
+static ssize_t read_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n = -1;
+
+	if (sscanf(buf, "%d", &n) < 1 || n < 0 || n >= NUM_BYTES) {
+		printk(KERN_WARNING "bptest: read: invalid index %d\n", n);
+		return -EINVAL;
+	}
+	printk(KERN_INFO "bptest: read: bytes[%d] = %d\n", n, bytes[n]);
+	return count;
+}
+static DRIVER_ATTR(read, 0200, NULL, read_store);
+
+/* Write n v to set bytes[n] = v */
+static ssize_t write_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n = -1;
+	int v;
+
+	if (sscanf(buf, "%d %d", &n, &v) < 2 || n < 0 || n >= NUM_BYTES) {
+		printk(KERN_WARNING "bptest: write: invalid index %d\n", n);
+		return -EINVAL;
+	}
+	bytes[n] = v;
+	printk(KERN_INFO "bptest: write: bytes[%d] <- %d\n", n, v);
+	return count;
+}
+static DRIVER_ATTR(write, 0200, NULL, write_store);
+
+
+/* Dummy routines for testing instruction breakpoints */
+static void r0(void)
+{
+	printk(KERN_INFO "This is r%d\n", 0);
+}
+static void r1(void)
+{
+	printk(KERN_INFO "This is r%d\n", 1);
+}
+static void r2(void)
+{
+	printk(KERN_INFO "This is r%d\n", 2);
+}
+static void r3(void)
+{
+	printk(KERN_INFO "This is r%d\n", 3);
+}
+
+static void (*rtns[])(void) = {
+	r0, r1, r2, r3
+};
+
+
+/* Write n to call routine r##n, or a blank line to call them all */
+static ssize_t call_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n;
+
+	if (sscanf(buf, "%d", &n) == 0) {
+		printk(KERN_INFO "bptest: call all routines\n");
+		r0();
+		r1();
+		r2();
+		r3();
+	} else if (n >= 0 && n < 4) {
+		printk(KERN_INFO "bptest: call r%d\n", n);
+		rtns[n]();
+	} else {
+		printk(KERN_WARNING "bptest: call: invalid index: %d\n", n);
+		count = -EINVAL;
+	}
+	return count;
+}
+static DRIVER_ATTR(call, 0200, NULL, call_store);
+
+
+/* Breakpoint callbacks */
+static void bptest_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	printk(KERN_INFO "Breakpoint %d triggered\n", bp - bps);
+}
+
+static void bptest_installed(struct hw_breakpoint *bp)
+{
+	printk(KERN_INFO "Breakpoint %d installed\n", bp - bps);
+}
+
+static void bptest_uninstalled(struct hw_breakpoint *bp)
+{
+	printk(KERN_INFO "Breakpoint %d uninstalled\n", bp - bps);
+}
+
+
+/* Breakpoint attribute files for testing */
+static ssize_t bp_show(int n, char *buf)
+{
+	struct hw_breakpoint *bp = &bps[n];
+	int a, len, type;
+
+	if (!bp->status)
+		return sprintf(buf, "bp%d: unregistered\n", n);
+
+	len = -1;
+	switch (bp->len) {
+#ifdef HW_BREAKPOINT_LEN_1
+	case HW_BREAKPOINT_LEN_1:	len = 1;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+	case HW_BREAKPOINT_LEN_2:	len = 2;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+	case HW_BREAKPOINT_LEN_4:	len = 4;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+	case HW_BREAKPOINT_LEN_4:	len = 8;	break;
+#endif
+	}
+
+	type = '?';
+	switch (bp->type) {
+#ifdef HW_BREAKPOINT_READ
+	case HW_BREAKPOINT_READ:	type = 'r';	break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case HW_BREAKPOINT_WRITE:	type = 'w';	break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case HW_BREAKPOINT_RW:		type = 'b';	break;
+#endif
+#ifdef HW_BREAKPOINT_EXECUTE
+	case HW_BREAKPOINT_EXECUTE:	type = 'e';	break;
+#endif
+	}
+
+	a = -1;
+	if (type == 'e') {
+		if (bp->address.kernel == r0)
+			a = 0;
+		else if (bp->address.kernel == r1)
+			a = 1;
+		else if (bp->address.kernel == r2)
+			a = 2;
+		else if (bp->address.kernel == r3)
+			a = 3;
+	} else {
+		const unsigned char *p = bp->address.kernel;
+
+		if (p >= bytes && p < bytes + NUM_BYTES)
+			a = p - bytes;
+	}
+
+	return sprintf(buf, "bp%d: %d %c %d %d [%sinstalled]\n",
+			n, bp->priority, type, a, len,
+			(bp->status < HW_BREAKPOINT_INSTALLED ? "not " : ""));
+}
+
+static ssize_t bp_store(int n, const char *buf, size_t count)
+{
+	struct hw_breakpoint *bp = &bps[n];
+	int prio, a, len;
+	char type;
+	int i;
+
+	if (count <= 1) {
+		printk(KERN_INFO "bptest: bp%d: format:  priority type "
+				"address len\n", n);
+		printk(KERN_INFO "  type = r, w, b, or e; address = 0 - 31; "
+				"len = 1, 2, 4, or 8\n");
+		printk(KERN_INFO "  Write any non-digit to unregister\n");
+		return count;
+	}
+
+	unregister_kernel_hw_breakpoint(bp);
+	printk(KERN_INFO "bptest: bp%d unregistered\n", n);
+
+	len = -1;
+	i = sscanf(buf, "%d %c %d %d", &prio, &type, &a, &len);
+	if (i == 0)
+		return count;
+	if (i < 3) {
+		printk(KERN_WARNING "bptest: bp%d: too few fields\n", n);
+		return -EINVAL;
+	}
+
+	bp->priority = prio;
+	switch (type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+	case 'e':
+		bp->type = HW_BREAKPOINT_EXECUTE;
+		bp->len = HW_BREAKPOINT_LEN_EXECUTE;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+	case 'r':
+		bp->type = HW_BREAKPOINT_READ;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case 'w':
+		bp->type = HW_BREAKPOINT_WRITE;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case 'b':
+		bp->type = HW_BREAKPOINT_RW;
+		break;
+#endif
+	default:
+		printk(KERN_WARNING "bptest: bp%d: invalid type %c\n",
+				n, type);
+		return -EINVAL;
+	}
+
+	if (a < 0 || a >= NUM_BYTES || (a >= 4 && type == 'e')) {
+		printk(KERN_WARNING "bptest: bp%d: invalid address %d\n",
+				n, a);
+		return -EINVAL;
+	}
+	if (type == 'e')
+		bp->address.kernel = rtns[a];
+	else {
+		bp->address.kernel = &bytes[a];
+
+		switch (len) {
+#ifdef HW_BREAKPOINT_LEN_1
+		case 1:		bp->len = HW_BREAKPOINT_LEN_1;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+		case 2:		bp->len = HW_BREAKPOINT_LEN_2;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+		case 4:		bp->len = HW_BREAKPOINT_LEN_4;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+		case 8:		bp->len = HW_BREAKPOINT_LEN_8;	break;
+#endif
+		default:
+			printk(KERN_WARNING "bptest: bp%d: invalid len %d\n",
+					n, len);
+			return -EINVAL;
+			break;
+		}
+	}
+
+	bp->triggered = bptest_triggered;
+	bp->installed = bptest_installed;
+	bp->uninstalled = bptest_uninstalled;
+
+	i = register_kernel_hw_breakpoint(bp);
+	if (i < 0) {
+		printk(KERN_WARNING "bptest: bp%d: failed to register %d\n",
+				n, i);
+		count = i;
+	} else
+		printk(KERN_INFO "bptest: bp%d registered: %d\n", n, i);
+	return count;
+}
+
+
+static ssize_t bp0_show(struct device_driver *d, char *buf)
+{
+	return bp_show(0, buf);
+}
+static ssize_t bp0_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(0, buf, count);
+}
+static DRIVER_ATTR(bp0, 0600, bp0_show, bp0_store);
+
+static ssize_t bp1_show(struct device_driver *d, char *buf)
+{
+	return bp_show(1, buf);
+}
+static ssize_t bp1_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(1, buf, count);
+}
+static DRIVER_ATTR(bp1, 0600, bp1_show, bp1_store);
+
+static ssize_t bp2_show(struct device_driver *d, char *buf)
+{
+	return bp_show(2, buf);
+}
+static ssize_t bp2_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(2, buf, count);
+}
+static DRIVER_ATTR(bp2, 0600, bp2_show, bp2_store);
+
+static ssize_t bp3_show(struct device_driver *d, char *buf)
+{
+	return bp_show(3, buf);
+}
+static ssize_t bp3_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(3, buf, count);
+}
+static DRIVER_ATTR(bp3, 0600, bp3_show, bp3_store);
+
+
+static int bptest_probe(struct platform_device *pdev)
+{
+	return -ENODEV;
+}
+
+static int bptest_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static struct platform_driver bptest_driver = {
+	.probe = bptest_probe,
+	.remove = bptest_remove,
+	.driver = {
+		.name = "bptest",
+		.owner = THIS_MODULE,
+	}
+};
+
+
+static struct driver_attribute *(bptest_group[]) = {
+	&driver_attr_bp0,
+	&driver_attr_bp1,
+	&driver_attr_bp2,
+	&driver_attr_bp3,
+	&driver_attr_call,
+	&driver_attr_read,
+	&driver_attr_write,
+	NULL
+};
+
+static int add_files(void)
+{
+	int rc = 0;
+	struct driver_attribute **g;
+
+	for (g = bptest_group; *g; ++g) {
+		rc = driver_create_file(&bptest_driver.driver, *g);
+		if (rc)
+			break;
+	}
+	return rc;
+}
+
+static void remove_files(void)
+{
+	struct driver_attribute **g;
+
+	for (g = bptest_group; *g; ++g)
+		driver_remove_file(&bptest_driver.driver, *g);
+}
+
+static int __init bptest_init(void)
+{
+	int rc;
+
+	rc = platform_driver_register(&bptest_driver);
+	if (rc) {
+		printk(KERN_ERR "Failed to register bptest driver: %d\n", rc);
+		return rc;
+	}
+	rc = add_files();
+	if (rc) {
+		remove_files();
+		platform_driver_unregister(&bptest_driver);
+		return rc;
+	}
+	printk("bptest loaded\n");
+	return 0;
+}
+
+static void __exit bptest_exit(void)
+{
+	int n;
+
+	remove_files();
+	for (n = 0; n < 4; ++n)
+		unregister_kernel_hw_breakpoint(&bps[n]);
+	platform_driver_unregister(&bptest_driver);
+	printk("bptest unloaded\n");
+}
+
+module_init(bptest_init);
+module_exit(bptest_exit);
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -296,6 +296,7 @@ __setup("idle=", idle_setup);
 void show_regs(struct pt_regs * regs)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
+	unsigned long d0, d1, d2, d3;
 
 	printk("\n");
 	printk("Pid: %d, comm: %20s\n", current->pid, current->comm);
@@ -320,6 +321,17 @@ void show_regs(struct pt_regs * regs)
 	cr3 = read_cr3();
 	cr4 = read_cr4_safe();
 	printk("CR0: %08lx CR2: %08lx CR3: %08lx CR4: %08lx\n", cr0, cr2, cr3, cr4);
+
+	get_debugreg(d0, 0);
+	get_debugreg(d1, 1);
+	get_debugreg(d2, 2);
+	get_debugreg(d3, 3);
+	printk("DR0: %08lx DR1: %08lx DR2: %08lx DR3: %08lx\n",
+			d0, d1, d2, d3);
+	get_debugreg(d2, 6);
+	get_debugreg(d3, 7);
+	printk("    DR6: %08lx DR7: %08lx\n", d2, d3);
+
 	show_trace(NULL, regs, &regs->esp);
 }
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-13 10:39                                               ` Roland McGrath
  2007-05-14 15:42                                                 ` Alan Stern
@ 2007-05-17 20:39                                                 ` Alan Stern
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-05-17 20:39 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Sun, 13 May 2007, Roland McGrath wrote:

> You may need some memory barriers around the switching/restart stuff.
> In fact, I think it would be better not to delve into reinventing the
> low-level bits there at all.  Instead use read_seqcount_retry there
> (linux/seqlock.h).  Using that read_seqcount_begin's value as the
> number to compare in thbi would also give a 32-bit sequence number.

I took a look at seqlock.h.  It turns out not to be a good match for my 
requirements; the header file specifically says that it won't work with 
data that contains pointers.  But changing over to regular 32-bit 
sequence numbers was straightforward.

The "switching"/"restart" stuff doesn't need memory barriers because
all the communication is between two routines on the same CPU.  Nor are 
memory barriers needed in the rest of the code for the kernel 
breakpoint updates; the IPI mechanism already provides its own.

However there is one oddball case which does seem to require a memory
barrier: when a new CPU comes online (either for the first time or
during return from hibernation).  There's a hook to load the initial
debug register values, and it runs in an atomic context so I can't
grab the mutex.  The hook is called in two places:

	arch/i386/power/cpu.c: fix_processor_context(), and
	arch/i386/kernel/smpboot.c: start_secondary().

A memory barrier is necessary to avoid chaos if another CPU should
happen to update the kernel breakpoint settings at the same time.  If
you can suggest a way around it, please do.

> It looks to me like there is quite a lot to be shared.  Of course the
> code can refer to constants like HB_NUM, they just have to be defined
> per machine.  The dr7 stuff can all be a couple of simple arch_foo
> hooks, which will be empty on other machines.  All of the list-managing
> logic, the prio stuff, etc., would be bad to copy.
> 
> The two flavors could probably be accomodated cleanly with an
> HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
> around some of the calls.  I'm not suggesting you try to figure out
> that code structure ahead of time.  But I don't think it will be a big
> barrier to code sharing.

Okay, if I don't worry about machines with two sets of code & data
debug registers (HB_TYPES_NUM = 2) then yes, quite a lot of the code is
sharable.  There will be a few arch-specific hooks to:

	Store the values into the debug registers;

	Take care of the DR7 calculations;

	Do address limit verification (see whether a pointer
	lies in user space or kernel space).

Nothing more seems to be needed.  Then there will be unsharable code, 
including:

	Dumping the debug registers while creating an aout-type
	core image;

	All the legacy ptrace stuff;

	The notify-handler itself.

Does all that sound about right?  

> I also want to get this machine-independent code sharing going for
> real.  I'd like to have powerpc working as the non-x86 demonstration
> before we declare things in good shape.  I don't expect you to write
> any powerpc support, but I hope I can get you to do the arch code
> separation to make the way for it.  If you'll take a crack at it, I'll
> fill in and test the powerpc bits and I think we'll get something very
> satisfactory ironed out pretty fast.  

How should this be arranged so that it can build okay on all platforms,
even ones where the low-level support code hasn't been written?  Maybe 
an arch-dependent CONFIG_HW_BREAKPOINT option?

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-16 19:03                                                     ` Alan Stern
@ 2007-05-23  8:47                                                       ` Roland McGrath
  2007-06-01 19:39                                                         ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-05-23  8:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> > This does not happen in reality.  Breakpoints can only be set by the
> > debugger, not by the program itself.  The debugger should always eat the trap.
> 
> Hmmm.  I put in a little extra code to account for the possibility that
> a program might want to set hardware breakpoints in itself.  Should
> this be removed?

Do you just mean a register_hw_breakpoint call made on current?  That
certainly ought to work.  That's still "the debugger", i.e. in utracespeak
the tracing engine.  My point was that there will never be a facility
intended for a program to use hw_breakpoint to generate a signal that gets
delivered to a handler in the vanilla way.  There's always some "outside"
agent who asked for the breakpoint and who is responsible for responding to
the traps it causes, never the program itself so as it would make sense for
it to actually see the signal in the end.

> > I guess my main objection to having .type and .len is the false implied
> > documentation of their presence and names, leading to people thinking they
> > can look at those values.  In fact, they are machine-specific and
> > implementation-specific bits of no intrinsic use to anyone else.
> 
> The fact that they are machine-specific and implementation-specific 
> doesn't necessarily make them of no use.  See the driver below.

The code in bp_show is exactly the kind of wrong I want to prevent.  When I
say they are machine-specific and implementation-specific, I mean there is
no specified part of the interface to which you can presume they correspond
directly.  The powerpc implementation will not have any field that is set
to HW_BREAKPOINT_LEN_8 and may well have none set to the type macros
either.  If you want to have some machine-specific macros or inlines to
yield the HW_BREAKPOINT_* values for a struct hw_breakpoint, then fine.

> Allow me to rephrase: When a debug exception occurs, the real DR6 value
> should be copied to vdr6, except that kprobes should adjust DR_STEP and
> hw_breakpoint should adjust the DR_TRAPn bits appropriately.  There's
> some question about what value the debug exception handler should write
> back to DR6, if anything.  

Agreed.

> As for what users expect of the low four bits, you are definitely 
> wrong.  My tests with gdb show that it relies on the CPU to clear those 
> bits whenever a data breakpoint is hit; it doesn't clear them itself 
> and it doesn't work properly if the kernel keeps virtualized versions 
> of them set.  That's on a Pentium 4 and on an AMD Duron.

Ok.  We were both going on what the manual said and I was assuming that
some chip had actually behaved that way and thus that's what users expect.

> 	Values written back to DR6 were retained in the register until 
> 	the next debug exception occurred.

Ok.  This behavior is invisible anyway.

> 	When the exception handler read DR6, the 0xffff0ff0 bits were
> 	set every time.  The 0x00001000 bit was never set, even if it
> 	had been turned on before the exception occurred.

Ok.  That is not really surprising.

> 	No matter what values were stored in the low four bits
> 	beforehand, when the exception occurred DR6 had only the
> 	bit for the debug register which was triggered.

Ok.  This makes the users' expectations make sense.  Maybe we can get the
Intel and AMD people to change the manual not to be misleading about this
(it says something terse about "never clears" and without more details I
read it as "never clears any bit, ever").  

What about DR_STEP?  i.e., if DR_STEP was set from a single-step and then
there was a DR_TRAPn debug exception, is DR_STEP still set?  If DR_TRAPn
was set and then you single-step, is DR_TRAPn cleared?

> 	If the handler wrote back any of BS, BT, or BD to DR6, then
> 	the system misbehaved.  I don't know exactly what happened,
> 	but my shell process ended and the debug handler got called
> 	over and over again (as if stuck in a loop) for several
> 	seconds.

Yowza.  That is really surprising.

> In light of these results, the best approach appears to be either to 
> leave DR6 alone or to set it to 0.

Agreed.  I suspect clearing it to zero is the right thing (given what the
hardware manuals say), even if it appears that DR_STEP and DR_TRAPn do
reset each other on the chips we have on hand.

> Below is a patch containing a driver meant for testing kernel hardware 
> breakpoints.  Instructions are in the comments at the top.  You can 
> build the driver by typing "make M=bptest" at the top level.

Thanks.

> The patch also adjust the Alt-SysRq-P handler to print out the debug 
> register values along with all the other stuff.

I think you should post that little patch (and equivalent for x86_64) by
itself.  There's no reason that shouldn't go right in.

> I took a look at seqlock.h.  It turns out not to be a good match for my 
> requirements; the header file specifically says that it won't work with 
> data that contains pointers.

There is no black magic about that, it's just saying that seqlock/seqcount
does not do any implicit synchronization with your data structure
management.  If the pointers in question are protected by RCU, there is no
problem (if your read_seqcount_retry loop is inside rcu_read_lock).  Since
the caller supplies the pointers, not requiring them to be freed by RCU
would be simplest for callers.  So what seems natural to me is to have a
simple unsigned long kdr[4] array that's updated by register/unregister
calls (while they hold the mutex to exclude each other).

> The "switching"/"restart" stuff doesn't need memory barriers because
> all the communication is between two routines on the same CPU.  Nor are 
> memory barriers needed in the rest of the code for the kernel 
> breakpoint updates; the IPI mechanism already provides its own.

Ok.  I thought we were talking about using seqlock to safely read from a
single global data set that's updated in place.  I can't really see why
anything but bp_task actually needs to be per-cpu.

> A memory barrier is necessary to avoid chaos if another CPU should
> happen to update the kernel breakpoint settings at the same time.  If
> you can suggest a way around it, please do.

The natural thing to me would be to just use the same seqcount-based update
style from a global kdr[4] here.

> 	Take care of the DR7 calculations;

Call it a generic "make it go after setting each debug register".
For most other machines this will be a no-op.

> 	Do address limit verification (see whether a pointer
> 	lies in user space or kernel space).

This is probably always < TASK_SIZE_OF (or TASK_SIZE #ifndef),
but it is probably right to make it an arch macro.

> 	Dumping the debug registers while creating an aout-type
> 	core image;

Ha.  Probably noone else has that arcane bit of compatibility to do, in fact.

> 	All the legacy ptrace stuff;

Right.  There is nothing in common about this except for needing something
(so maybe an arch-defined struct inside the struct thread_hw_breakpoint).

> Does all that sound about right?  

It does.

> How should this be arranged so that it can build okay on all platforms,
> even ones where the low-level support code hasn't been written?  Maybe 
> an arch-dependent CONFIG_HW_BREAKPOINT option?

I am no authority on kconfig, so seek other advice.  

What kprobes does is a separate "config KPROBES" in each arch/foo/Kconfig.
This means that the details and help text must be given separately for each
one.  This is a bug or a feature, depending on whether you dislike
repeating the same help text in several places and having it drift and not
stay uniformly maintained, or you want to include different arch-specific
details in the text.

The other option I see is one central:

config HW_BREAKPOINT
	depends on X86 || X86_64 || ...
	...

AIUI, with this, arch/foo/Kconfig can still have just the lines:

config HW_BREAKPOINT
	depends on !FOOBAR

when the FOOBAR submodel of the foo arch does not have the hardware
support, or for whatever reason an arch adds more constraints to the
generically-defined config option.

The latter one is what I would do, but I might get corrected if I did.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-05-23  8:47                                                       ` Roland McGrath
@ 2007-06-01 19:39                                                         ` Alan Stern
  2007-06-14  6:48                                                           ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-06-01 19:39 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Wed, 23 May 2007, Roland McGrath wrote:

> > > I guess my main objection to having .type and .len is the false implied
> > > documentation of their presence and names, leading to people thinking they
> > > can look at those values.  In fact, they are machine-specific and
> > > implementation-specific bits of no intrinsic use to anyone else.
> > 
> > The fact that they are machine-specific and implementation-specific 
> > doesn't necessarily make them of no use.  See the driver below.
> 
> The code in bp_show is exactly the kind of wrong I want to prevent.  When I
> say they are machine-specific and implementation-specific, I mean there is
> no specified part of the interface to which you can presume they correspond
> directly.  The powerpc implementation will not have any field that is set
> to HW_BREAKPOINT_LEN_8 and may well have none set to the type macros
> either.  If you want to have some machine-specific macros or inlines to
> yield the HW_BREAKPOINT_* values for a struct hw_breakpoint, then fine.

I really don't understand your point here.  What's wrong with bp_show?  
Is it all the preprocessor conditionals?  I thought that was how we had 
agreed portable code should determine which types and lengths were 
supported on a particular architecture.

Consider that the definition of struct hw_breakpoint is in
include/asm-generic/.  Hence .type and .len are guaranteed to be
present on all architectures; we can't just leave them out on some
while including them on others.  In particular, .len _will_ always be
equal to HW_BREAKPOINT_LEN_8 on PPC.  (Of course, you're always
free to define HW_BREAKPOINT_LEN_8 as 0 in the arch-specific header
file if you want, so this doesn't mean as much as it might seem.)

Consider also that .type and .len impose no overhead on architectures 
that don't care about them.  The space they use up would be wasted 
otherwise.  It seems that what you want would complicate the x86 
implementations significantly without offering any real benefit to 
others.

The one thing which makes sense to me is that some architectures might 
want to store type and/or length bits in along with the address field.  
So I added documentation explaining that there may be arch-specific 
changes to .address while a breakpoint is registered, and I added 
arch-specific accessors to fetch the true address value.  There are 
also arch-specific hooks where those bits can be set and removed.


> What about DR_STEP?  i.e., if DR_STEP was set from a single-step and then
> there was a DR_TRAPn debug exception, is DR_STEP still set?  If DR_TRAPn
> was set and then you single-step, is DR_TRAPn cleared?

I didn't experiment with using DR_STEP.  There wasn't any simple way to
cause a single-step exception.  Perhaps if I were more familiar with
kprobes...

> > 	If the handler wrote back any of BS, BT, or BD to DR6, then
> > 	the system misbehaved.  I don't know exactly what happened,
> > 	but my shell process ended and the debug handler got called
> > 	over and over again (as if stuck in a loop) for several
> > 	seconds.
> 
> Yowza.  That is really surprising.

Even more surprising was that it stopped and settled back down to 
normal after a little while!  I'm not accustomed to seeing infinite 
loops come to an end.  :-)

> > In light of these results, the best approach appears to be either to 
> > leave DR6 alone or to set it to 0.
> 
> Agreed.  I suspect clearing it to zero is the right thing (given what the
> hardware manuals say), even if it appears that DR_STEP and DR_TRAPn do
> reset each other on the chips we have on hand.

Yes.  The new version sets it to 0.


> > I took a look at seqlock.h.  It turns out not to be a good match for my 
> > requirements; the header file specifically says that it won't work with 
> > data that contains pointers.
> 
> There is no black magic about that, it's just saying that seqlock/seqcount
> does not do any implicit synchronization with your data structure
> management.  If the pointers in question are protected by RCU, there is no
> problem (if your read_seqcount_retry loop is inside rcu_read_lock).  Since
> the caller supplies the pointers, not requiring them to be freed by RCU
> would be simplest for callers.  So what seems natural to me is to have a
> simple unsigned long kdr[4] array that's updated by register/unregister
> calls (while they hold the mutex to exclude each other).

In fact, I don't need the seqcount stuff at all.  Just about everything
it provides is already covered by RCU.  One of the secrets is to move
the counter (gennum) into the RCU-protected structure.

> Ok.  I thought we were talking about using seqlock to safely read from a
> single global data set that's updated in place.  I can't really see why
> anything but bp_task actually needs to be per-cpu.

The other secret is to have shared access only to the global data in
the RCU-protected structure, which means storing an array of pointers
to the highest-priority kernel breakpoints there, as you suggest.  The
data which gets updated in place then doesn't need to be shared, so it
doesn't need seqlock.

And you're basically right about the per-cpu data.  Now it contains
only two values: bp_task and cur_kbpdata (a pointer to the most
recently used version of the RCU-protected data).


> > How should this be arranged so that it can build okay on all platforms,
> > even ones where the low-level support code hasn't been written?  Maybe 
> > an arch-dependent CONFIG_HW_BREAKPOINT option?
> 
> I am no authority on kconfig, so seek other advice.  

I decided on something simpler than messing around with Kconfig.  I put 
all the generic code in kernel/hw_breakpoint.c, together with an 
explanation that the file isn't meant to be compiled standalone but 
instead should be #include'd by the arch-specific file.  So things are 
nice and separate, and the new routines don't get built into the kernel 
unless the arch can use them.

It wasn't so easy to separate out the generic portions of the data
structure definitions, so I didn't bother to try.  There are comments
indicating the boundaries between the generic and arch-specific parts.

This is getting pretty close to a final form.  The patch below is for 
2.6.22-rc3.  See what you think...

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,30 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint address accessors */
+inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
+{
+	return bp->address.kernel;
+}
+
+inline const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *bp)
+{
+	return bp->address.user;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1		0x40
+#define HW_BREAKPOINT_LEN_2		0x44
+#define HW_BREAKPOINT_LEN_4		0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE	0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* __KERNEL__ */
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
 		tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t
 
 	x86_write_percpu(current_task, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,46 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	unsigned long dr6;
 
-	get_debugreg(condition, 6);
+	get_debugreg(dr6, 6);
+	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
+	if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		tsk->thread.vdr6 = 0;
 
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	if (notify_die(DIE_DEBUG, "debug", regs, (long) &dr6, error_code,
+			SIGTRAP) == NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+		dr6 &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~X86_EFLAGS_TF;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 = dr6;
 
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (dr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -61,4 +63,29 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+
+/*
+ * HW breakpoint additions
+ */
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+	set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,631 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	How to know whether RF should be cleared when setting a user
+	execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	int			num_installed;	/* Number of installed bps */
+	unsigned		gennum;		/* update-generation number */
+
+	/* Only the portions below are arch-specific */
+
+	/* ptrace support -- Note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available.
+	 */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			representing virtualized debug registers 0 - 3 */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	unsigned long		tkdr7;		/* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+	unsigned		gennum;		/* Generation number */
+	int			num_kbps;	/* Number of kernel bps */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+
+	/* Only the portions below are arch-specific */
+	unsigned long		mkdr7;		/* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct kernel_bp_data	*cur_kbpdata;	/* Current kbpdata[] entry */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data	kbpdata[2];	/* Old and new settings */
+static int			cur_kbpindex;	/* Alternates 0, 1, ... */
+static struct kernel_bp_data	*cur_kbpdata = &kbpdata[0];
+			/* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long		kdr7;		/* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1).  The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0003,	/* LEN0, R/W0, G0, L0 */
+	0x00ff000f,	/* Same for 0,1 */
+	0x0fff003f,	/* Same for 0,1,2 */
+	0xffff00ff	/* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint **bps;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	bps = chbi->cur_kbpdata->bps;
+	switch (chbi->cur_kbpdata->num_kbps) {
+	case 4:
+		set_debugreg(bps[3]->address.va, 3);
+	case 3:
+		set_debugreg(bps[2]->address.va, 2);
+	case 2:
+		set_debugreg(bps[1]->address.va, 1);
+	case 1:
+		set_debugreg(bps[0]->address.va, 0);
+	}
+	/* No need to set DR6 */
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+		struct kernel_bp_data *thr_kbpdata)
+{
+	int num = thr_kbpdata->num_kbps;
+
+	thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static inline void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+	/* Install the user breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up; there are num_kbps of them.
+	 * User breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for.
+	 */
+	switch (thbi->num_installed) {
+	case 4:
+		set_debugreg(thbi->tdr[0], 0);
+	case 3:
+		set_debugreg(thbi->tdr[1], 1);
+	case 2:
+		set_debugreg(thbi->tdr[2], 2);
+	case 1:
+		set_debugreg(thbi->tdr[3], 3);
+	}
+	/* No need to set DR6 */
+	set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static inline void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static inline void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+	int num = new_kbpdata->num_kbps;
+
+	new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static inline int arch_check_va_in_userspace(unsigned long va,
+		struct task_struct *tsk)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE_OF(t)	TASK_SIZE
+#endif
+	return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static inline int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE64	TASK_SIZE
+#endif
+	return (va >= TASK_SIZE64);
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type)
+{
+	unsigned long temp;
+
+	temp = (len | type) & 0xf;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_SLOWDOWN;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+	int is_user;
+	struct list_head *bp_list;
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	if (thbi) {
+		is_user = 1;
+		bp_list = &thbi->thread_bps;
+		drnum = HB_NUM - 1;
+	} else {
+		is_user = 0;
+		bp_list = &kernel_bps;
+		drnum = 0;
+	}
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later.
+	 */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		dr7 |= encode_dr7(drnum, bp->len, bp->type);
+		if (++i >= HB_NUM)
+			break;
+		if (is_user)
+			--drnum;
+		else
+			++drnum;
+	}
+	return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static inline void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+
+	/* If this is an execution breakpoint for the current PC address,
+	 * we should clear the task's RF so that the bp will be certain
+	 * to trigger.
+	 *
+	 * FIXME: It's not so easy to get hold of the task's PC as a linear
+	 * address!  ptrace.c does this already...
+	 */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static inline void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static inline void arch_register_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static inline void arch_unregister_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = thbi->vdr_bps[i].address.va;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that the breakpoint
+	 * was hit so the thread's debugger will see it, and send the
+	 * debugging signal.
+	 */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+		send_sigtrap(tsk, regs, 0);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address.va;
+	} else if (n == 6) {
+		val = tsk->thread.vdr6;
+	} else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+	*len = (temp & 0xc) | 0x40;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints, making the
+	 * appropriate changes to each.
+	 */
+ restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint before trying to change it */
+		if (bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's new settings */
+		bp->len = len;
+		bp->type = type;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will raise an error here.
+		 */
+		if (enabled) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	/* We have to hold this lock the entire time, to prevent thbi
+	 * from being deallocated out from under us.
+	 */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		/* If the breakpoint is registered then unregister it,
+		 * change it, and re-register it.  Revert to the original
+		 * address if an error occurs.
+		 */
+		if (bp->status) {
+			unsigned long old_addr = bp->address.va;
+
+			__unregister_user_hw_breakpoint(tsk, bp);
+			bp->address.va = val;
+			rc = __register_user_hw_breakpoint(tsk, bp);
+			if (rc < 0) {
+				bp->address.va = old_addr;
+				__register_user_hw_breakpoint(tsk, bp);
+			}
+		} else {
+			bp->address.va = val;
+			rc = 0;
+		}
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi = NULL;
+
+	/* A pointer to the DR6 value is stored in args->err */
+#define DR6	(* (unsigned long *) (args->err))
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch.
+		 */
+		switch_to_none_hw_breakpoint();
+	} else
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions.
+	 */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (likely(!(DR6 & (DR_TRAP0 << i))))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback.
+		 */
+		if (i < chbi->cur_kbpdata->num_kbps)
+			bp = chbi->cur_kbpdata->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp) {		/* Should always be non-NULL */
+
+			/* Set RF at execution breakpoints */
+			if (bp->type == HW_BREAKPOINT_EXECUTE)
+				args->regs->eflags |= X86_EFLAGS_RF;
+			(bp->triggered)(bp, args->regs);
+		}
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+	put_cpu_no_resched();
+
+	/* Mask away the bits we have handled */
+	DR6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Early exit from the notifier chain if everything has been handled */
+	if (DR6 == 0)
+		return NOTIFY_STOP;
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	disable_debug_registers();
 }
 
 void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -660,9 +660,18 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+
+	/* A pointer to the DR6 value is stored in args->err */
+#define DR6	(* (unsigned long *) (args->err))
+
+		if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
+			DR6 &= ~DR_STEP;
+			if (DR6 == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
+#undef DR6
+
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
 		/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,224 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.  While a breakpoint
+ * is registered @address may be modified in an arch-specific manner;
+ * to retrieve its value during this period use the accessor routines
+ * hw_breakpoint_get_kaddr() or hw_breakpoint_get_uaddr().
+ *
+ * @len encodes the breakpoint's extent in bytes, which is subject to
+ * certain limitations.  include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture.  Note that
+ * @address must have the alignment specified by @len.  The breakpoint
+ * will catch accesses to any byte in the range from @address to @address
+ * + (N - 1), where N is the value encoded by @len.
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures.  Execute breakpoints
+ * must have @len equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * In register_user_hw_breakpoint(), @address must refer to a location in
+ * user space (set @address.user).  The breakpoint will be active only
+ * while the requested task is running.  Conversely in
+ * register_kernel_hw_breakpoint(), @address must refer to a location in
+ * kernel space (set @address.kernel), and the breakpoint will be active
+ * on all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception).  All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint().  If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.  (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.  Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	my_bp.address.kernel = &pid_max;
+ * 	my_bp.type = HW_BREAKPOINT_WRITE;
+ * 	my_bp.len = HW_BREAKPOINT_LEN_4;
+ * 	my_bp.triggered = triggered;
+ * 	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * 	rc = register_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * 	..........<do anything>............
+ * 	unregister_kernel_hw_breakpoint(&my_bp);
+ * 	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	union {
+		const void		*kernel;
+		const void __user	*user;
+		unsigned long		va;
+	}		address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/*
+ * Inline accessor routines to retrieve a breakpoint's address:
+ */
+extern const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *);
+extern const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture.  On i386 the
+ * possibilities are:
+ *
+ *	HW_BREAKPOINT_LEN_1
+ *	HW_BREAKPOINT_LEN_2
+ *	HW_BREAKPOINT_LEN_4
+ *	HW_BREAKPOINT_LEN_EXECUTE
+ *	HW_BREAKPOINT_RW
+ *	HW_BREAKPOINT_READ
+ *	HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable.  You can use #ifdef
+ * to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* __KERNEL__ */
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/debugreg.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
+	disable_debug_registers();
 
 	control_page = page_address(image->control_code_page);
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
 #include <smpboot_hooks.h>
 #include <asm/vmi.h>
 #include <asm/mtrr.h>
+#include <asm/debugreg.h>
 
 /* Set if we find a B stepping CPU */
 static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
 	local_irq_enable();
 
 	wmb();
+	load_debug_registers();
 	cpu_idle();
 }
 
@@ -1210,6 +1212,7 @@ int __cpu_disable(void)
 	fixup_irqs(map);
 	/* It's now safe to remove this processor from the online map */
 	cpu_clear(cpu, cpu_online_map);
+	disable_debug_registers();
 	return 0;
 }
 
Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,759 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines.  It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+	struct kernel_bp_data *thr_kbpdata;
+
+	/* This routine is on the hot path; it gets called for every
+	 * context switch into a task with active breakpoints.  We
+	 * must make sure that the common case executes as quickly as
+	 * possible.
+	 */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Use RCU to synchronize with external updates */
+	rcu_read_lock();
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this time.  If they are, they will modify
+	 * the other entry in kbpdata[] -- the one not pointed to
+	 * by chbi->cur_kbpdata.  So the update itself won't affect
+	 * us directly.
+	 *
+	 * However when the update is finished, an IPI will arrive
+	 * telling this CPU to change chbi->cur_kbpdata.  We need
+	 * to use a single consistent kbpdata[] entry, the present one.
+	 * So we'll copy the pointer to a local variable, thr_kbpdata,
+	 * and we must prevent the compiler from aliasing the two
+	 * pointers.  Only a compiler barrier is required, not a full
+	 * memory barrier, because everything takes place on a single CPU.
+	 */
+ restart:
+	thr_kbpdata = chbi->cur_kbpdata;
+	barrier();
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the kernel breakpoints have
+	 * changed or any user breakpoints have been registered or
+	 * unregistered, we need to handle the updates and possibly
+	 * send out some notifications.
+	 */
+	if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+		struct hw_breakpoint *bp;
+		int i;
+		int num;
+
+		thbi->gennum = thr_kbpdata->gennum;
+		arch_update_thbi(thbi, thr_kbpdata);
+		num = thr_kbpdata->num_kbps;
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		thbi->num_installed = 0;
+		i = HB_NUM;
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < num) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				++thbi->num_installed;
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+	}
+
+	/* Set the debug register */
+	arch_install_thbi(thbi);
+
+	/* Were there any kernel breakpoint changes while we were running? */
+	if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+		/* DR0-3 might now be assigned to kernel bps and we might
+		 * have messed them up.  Reload all the kernel bps and
+		 * then reload the thread bps.
+		 */
+		arch_install_chbi(chbi);
+		goto restart;
+	}
+
+	rcu_read_unlock();
+	put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = NULL;
+
+	/* This routine gets called from only two places.  In one
+	 * the caller holds the hw_breakpoint_mutex; in the other
+	 * interrupts are disabled.  In either case, no kernel
+	 * breakpoint updates can arrive while the routine runs.
+	 * So we don't need to use RCU.
+	 */
+	arch_install_none(chbi);
+	put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	/* Install both the kernel and the user breakpoints */
+	arch_install_chbi(chbi);
+	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+		switch_to_thread_hw_breakpoint(tsk);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	/* We don't need to use any sort of memory barrier.  The IPI
+	 * carried out by on_each_cpu() includes its own barriers.
+	 */
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+	synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	unsigned long flags;
+
+	/* Prevent IPIs for new kernel breakpoint updates */
+	local_irq_save(flags);
+
+	rcu_read_lock();
+	update_this_cpu(NULL);
+	rcu_read_unlock();
+
+	local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.  Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio.
+	 */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+	struct kernel_bp_data *new_kbpdata;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps.
+	 */
+	k = 0;			/* Next kernel bp to allocate */
+	u = HB_NUM - 1;		/* Next user bp to allocate */
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k <= u) {
+		if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+			--u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != cur_kbpdata->num_kbps)
+		changed = 1;
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled.
+	 */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+		cur_kbpindex ^= 1;
+		new_kbpdata = &kbpdata[cur_kbpindex];
+		new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+		new_kbpdata->num_kbps = k;
+		arch_new_kbpdata(new_kbpdata);
+		u = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (u >= k)
+				break;
+			new_kbpdata->bps[u] = bp;
+			++u;
+		}
+		rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		for (u = 0; u < k; ++u) {
+			bp = new_kbpdata->bps[u];
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+
+			/* Force an update the next time tsk runs */
+			thbi->gennum = cur_kbpdata->gennum - 2;
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE?
+	 */
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = HB_NUM - 1;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		thbi->tdr[i] = bp->address.va;
+		if (--i < 0)
+			break;
+	}
+	while (i >= 0)
+		thbi->bps[i--] = NULL;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	bp->status = HW_BREAKPOINT_REGISTERED;
+	list_add_tail(&bp->node, &temp_bp->node);
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(&thbi->node)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	}
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid.
+	 */
+	list_del(&bp->node);
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+	unsigned long len;
+
+	switch (bp->type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+	case HW_BREAKPOINT_EXECUTE:
+		if (bp->len != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+	case HW_BREAKPOINT_READ:	break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case HW_BREAKPOINT_WRITE:	break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case HW_BREAKPOINT_RW:		break;
+#endif
+	default:
+		return rc;
+	}
+
+	switch (bp->len) {
+#ifdef HW_BREAKPOINT_LEN_1
+	case HW_BREAKPOINT_LEN_1:
+		len = 1;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+	case HW_BREAKPOINT_LEN_2:
+		len = 2;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+	case HW_BREAKPOINT_LEN_4:
+		len = 4;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+	case HW_BREAKPOINT_LEN_8:
+		len = 8;
+		break;
+#endif
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len.
+	 */
+	if (bp->address.va & (len - 1))
+		return rc;
+
+	/* Check that the virtual address is in the proper range */
+	if (tsk) {
+		if (!arch_check_va_in_userspace(bp->address.va, tsk))
+			return rc;
+	} else {
+		if (!arch_check_va_in_kernelspace(bp->address.va))
+			return rc;
+	}
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	arch_register_user_hw_breakpoint(bp, thbi);
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase.
+	 */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	arch_unregister_user_hw_breakpoint(bp, thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary.
+	 */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	arch_register_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < cur_kbpdata->num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	arch_unregister_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-01 19:39                                                         ` Alan Stern
@ 2007-06-14  6:48                                                           ` Roland McGrath
  2007-06-19 20:35                                                             ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-06-14  6:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> I really don't understand your point here.  What's wrong with bp_show?  
> Is it all the preprocessor conditionals?  I thought that was how we had 
> agreed portable code should determine which types and lengths were 
> supported on a particular architecture.

That part is fine.  The problem is fetching the hw_breakpoint.len field
directly and expecting it to contain the API values.  In an implementation
done as I've been referring to, there is no need for any field to contain
the HW_BREAKPOINT_LEN_8 value, and it's a waste to store one.  If it were
hw_breakpoint_get_len(bp), that would be fine.

> Consider that the definition of struct hw_breakpoint is in
> include/asm-generic/.  [...]
> The one thing which makes sense to me is that some architectures might 
> want to store type and/or length bits in along with the address field.  

Indeed, that is the natural thing (and all the bits needed) on several.
I hadn't raised this before since I was having so much trouble already
convincing you about storing things in machine-dependent fashion so that
users cannot just use the struct fields directly.

I really think it would be cleanest all around to use just:

	struct arch_hw_breakpoint info;

in place of address union, len, type in struct hw_breakpoint.  Then each
arch provides hw_breakpoint_get_{kaddr,uaddr,len,type} inlines.  For
storing, each arch can define hw_breakpoint_init(addr, len, type) (or
maybe k/u variants).  This can be used by callers directly if you want to
keep register_hw_breakpoint to one argument, or could just be internal if
register_hw_breakpoint takes the three more args.  If callers use it
directly, there can also be an INIT_ARCH_HW_BREAKPOINT(addr, len, type)
for use in struct hw_breakpoint_init initializers.

On x86 use:

	struct arch_hw_breakpoint_info {
		union {
			const void		*kernel;
			const void	__user	*user;
			unsigned long		va;
		}		address;
		u8		len;
		u8		type;
	} __attribute__((packed));

and the size of struct hw_breakpoint won't increase.

> > What about DR_STEP?  i.e., if DR_STEP was set from a single-step and then
> > there was a DR_TRAPn debug exception, is DR_STEP still set?  If DR_TRAPn
> > was set and then you single-step, is DR_TRAPn cleared?
> 
> I didn't experiment with using DR_STEP.  There wasn't any simple way to
> cause a single-step exception.  Perhaps if I were more familiar with
> kprobes...

It's easy for user mode with gdb.  kprobes is simple to use, and it
always does a single-step to execute (a copy of) the instruction that 
was overwritten with the breakpoint.  So, write a module that does:

	int testvar=0;
	asm(".globl testme; testme: movl $17,testvar; ret");
	void testme();
	testinit() {
		... register kprobe at &testme ...
		... register hw_breakpoint at &testvar ...
		testme()
	}

Your kprobe handlers don't have to actually do anything at all, if you
are just hacking the low-level code so see what %dr6 values you get at
each trap.

> I decided on something simpler than messing around with Kconfig.  

I still think it's the proper thing to make it conditional, not always
built in.  But it's a pedantic point.

> This is getting pretty close to a final form.  The patch below is for 
> 2.6.22-rc3.  See what you think...

Indeed I think we have come nearly as far as we will before we have a few
arch ports get done and some heavy use to find the rough edges.  Thanks
very much for being so accomodating to all my criticism, which I hope has
been constructive.

> +inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)

These need to be static inline.  Here you're defining a global function
in every .o file that uses the header.

> +	get_debugreg(dr6, 6);
> +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> +	if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> +		tsk->thread.vdr6 = 0;

Some comment here about this conditional clearing, please.

> +
> +/*
> + * HW breakpoint additions
> + */
> +
> +#define HB_NUM		4	/* Number of hardware breakpoints */

Need #ifdef __KERNEL__ around all these additions to debugreg.h.

> +static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,

For local functions in a source file (not a header), it's standard form
now just to define them static, not static inline.  For these trivial
ones, the compiler will always inline them.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-14  6:48                                                           ` Roland McGrath
@ 2007-06-19 20:35                                                             ` Alan Stern
  2007-06-25 10:52                                                               ` Roland McGrath
  2007-06-25 11:32                                                               ` Roland McGrath
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-06-19 20:35 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 76653 bytes --]

On Wed, 13 Jun 2007, Roland McGrath wrote:

> > I really don't understand your point here.  What's wrong with bp_show?  
> > Is it all the preprocessor conditionals?  I thought that was how we had 
> > agreed portable code should determine which types and lengths were 
> > supported on a particular architecture.
> 
> That part is fine.  The problem is fetching the hw_breakpoint.len field
> directly and expecting it to contain the API values.  In an implementation
> done as I've been referring to, there is no need for any field to contain
> the HW_BREAKPOINT_LEN_8 value, and it's a waste to store one.  If it were
> hw_breakpoint_get_len(bp), that would be fine.

"A waste to store one"?  Waste of what?  It isn't a waste of space; the 
space would otherwise be unused.  Waste of an instruction, perhaps.

> Indeed, that is the natural thing (and all the bits needed) on several.
> I hadn't raised this before since I was having so much trouble already
> convincing you about storing things in machine-dependent fashion so that
> users cannot just use the struct fields directly.

It is now possible for an implementation to store things in a 
machine-dependent fashion; I have added accessor routines as you 
suggested.  But I also left the fields as they were; the documentation 
mentions that they won't necessarily contain any particular values.

You might want to examine the check in validate_settings() for address 
alignment; it might not be valid if other values get stored in the 
low-order bits of the address.  This is a tricky point; it's not safe 
to mix bits around unless you know that the data values are correct, 
but in validate_settings() you don't yet know that.

> On x86 use:
> 
> 	struct arch_hw_breakpoint_info {
> 		union {
> 			const void		*kernel;
> 			const void	__user	*user;
> 			unsigned long		va;
> 		}		address;
> 		u8		len;
> 		u8		type;
> 	} __attribute__((packed));
> 
> and the size of struct hw_breakpoint won't increase.

Maybe.  I don't see any reason for the unnecessary encapsulation, 
though.

> > > What about DR_STEP?  i.e., if DR_STEP was set from a single-step and then
> > > there was a DR_TRAPn debug exception, is DR_STEP still set?  If DR_TRAPn
> > > was set and then you single-step, is DR_TRAPn cleared?
> > 
> > I didn't experiment with using DR_STEP.  There wasn't any simple way to
> > cause a single-step exception.  Perhaps if I were more familiar with
> > kprobes...
> 
> It's easy for user mode with gdb.

Yes, of course.  I feel foolish for having forgotten.

Tests show that my CPU does not clear DR_STEP when a data breakpoint is
hit.  Conversely, the DR_TRAPn bits are cleared even when a single-step 
exception occurs.

The bizarre behavior from before is still present; the system gets in a
long loop when the exception handler leaves any of the 0xe000 bits set
in DR6.  And it kills my shell process, probably by sending it a
SIGTRAP.  Oddly enough, this only happens when there's a kernel-space
debug exception -- faults in user-space continue to work normally.  
It's not clear what this means; the behavior indicates a software
problem but the dependency on the DR6 value indicates a hardware
contribution as well...

If you're interested, I can send you the code I used to do this testing
so you can try it on your machine.


> > I decided on something simpler than messing around with Kconfig.  
> 
> I still think it's the proper thing to make it conditional, not always
> built in.  But it's a pedantic point.

We have three things to consider: ptrace, utrace, and hw-breakpoint.  
Ultimately hw-breakpoint should become part of utrace; we might not
want to bother with a standalone version.

Furthermore, hw-breakpoint takes over the ptrace's mechanism for
breakpoint handling.  If we want to allow a configuration where ptrace
is present and hw-breakpoint isn't, then I would have to add an
alternate implementation containing only support for the legacy
interface.

It doesn't have to be done now, but it is something to bear in mind 
while trying to decide what things should be conditional on which 
options.


> Indeed I think we have come nearly as far as we will before we have a few
> arch ports get done and some heavy use to find the rough edges.  Thanks
> very much for being so accomodating to all my criticism, which I hope has
> been constructive.

There's no question that the code is much improved as a result of our 
interaction.


> > +inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
> 
> These need to be static inline.  Here you're defining a global function
> in every .o file that uses the header.

Whoops.  It's fixed now.

> > +	get_debugreg(dr6, 6);
> > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> > +	if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> > +		tsk->thread.vdr6 = 0;
> 
> Some comment here about this conditional clearing, please.

In fact I decided to change that whole thing around.  Now dr6 gets
stored in vdr6 immediately, with no conditional.  This is the right
thing to do when hw-breakpoint support is missing (aside from false
triggers caused by lazy debug register switching).  Then the
hw-breakpoint notifier routine clears the DR_TRAPn bits from vdr6, and
the ptrace "triggered" callback sets the appropriate virtualized bits
in vdr6.  Overall it's a lot simpler and easier to analyze.

I made a few other changes to do_debug.  For instance, it no longer 
checks whether notify_die() returns NOTIFY_STOP.  That check was a 
mistake to begin with; NOTIFY_STOP merely means to cut the notifier 
chain short -- it doesn't mean that the debug exception can be ignored.  
Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are 
set in vdr6; this is now the appropriate condition.

> > +
> > +/*
> > + * HW breakpoint additions
> > + */
> > +
> > +#define HB_NUM		4	/* Number of hardware breakpoints */
> 
> Need #ifdef __KERNEL__ around all these additions to debugreg.h.

Done.

> > +static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,
> 
> For local functions in a source file (not a header), it's standard form
> now just to define them static, not static inline.  For these trivial
> ones, the compiler will always inline them.

Okay.  Here's the latest form of the code, with the updated bptest 
patch as an attachment.

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,70 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+#define	__ARCH_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint static initializers */
+#define HW_BREAKPOINT_KINIT(addr, _len, _type)		\
+		.address	= {.kernel = addr,},	\
+		.len		= _len,			\
+		.type		= _type
+
+#define HW_BREAKPOINT_UINIT(addr, _len, _type)		\
+		.address	= {.user = addr,},	\
+		.len		= _len,			\
+		.type		= _type
+
+/* HW breakpoint setter routines */
+static inline void hw_breakpoint_kinit(struct hw_breakpoint *bp,
+		const void *addr, unsigned len, unsigned type)
+{
+	bp->address.kernel = addr;
+	bp->len = len;
+	bp->type = type;
+}
+
+static inline void hw_breakpoint_uinit(struct hw_breakpoint *bp,
+		const void __user *addr, unsigned len, unsigned type)
+{
+	bp->address.user = addr;
+	bp->len = len;
+	bp->type = type;
+}
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
+{
+	return bp->address.kernel;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddr(
+		struct hw_breakpoint *bp)
+{
+	return bp->address.user;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+	return bp->len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+	return bp->type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1		0x40
+#define HW_BREAKPOINT_LEN_2		0x44
+#define HW_BREAKPOINT_LEN_4		0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE	0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* __KERNEL__ */
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
 		tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t
 
 	x86_write_percpu(current_task, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,42 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	unsigned long dr6;
 
-	get_debugreg(condition, 6);
+	get_debugreg(dr6, 6);
+	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
+
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 = dr6;
+
+	notify_die(DIE_DEBUG, "debug", regs, dr6, error_code, SIGTRAP);
 
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
-		return;
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+		tsk->thread.vdr6 &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~X86_EFLAGS_TF;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
-
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -61,4 +63,32 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+	set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif	/* __KERNEL__ */
+
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,633 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	How to know whether RF should be cleared when setting a user
+	execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	int			num_installed;	/* Number of installed bps */
+	unsigned		gennum;		/* update-generation number */
+
+	/* Only the portions below are arch-specific */
+
+	/* ptrace support -- Note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available.
+	 */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			representing virtualized debug registers 0 - 3 */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	unsigned long		tkdr7;		/* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+	unsigned		gennum;		/* Generation number */
+	int			num_kbps;	/* Number of kernel bps */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+
+	/* Only the portions below are arch-specific */
+	unsigned long		mkdr7;		/* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct kernel_bp_data	*cur_kbpdata;	/* Current kbpdata[] entry */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data	kbpdata[2];	/* Old and new settings */
+static int			cur_kbpindex;	/* Alternates 0, 1, ... */
+static struct kernel_bp_data	*cur_kbpdata = &kbpdata[0];
+			/* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long		kdr7;		/* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1).  The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0003,	/* LEN0, R/W0, G0, L0 */
+	0x00ff000f,	/* Same for 0,1 */
+	0x0fff003f,	/* Same for 0,1,2 */
+	0xffff00ff	/* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint **bps;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	bps = chbi->cur_kbpdata->bps;
+	switch (chbi->cur_kbpdata->num_kbps) {
+	case 4:
+		set_debugreg(bps[3]->address.va, 3);
+	case 3:
+		set_debugreg(bps[2]->address.va, 2);
+	case 2:
+		set_debugreg(bps[1]->address.va, 1);
+	case 1:
+		set_debugreg(bps[0]->address.va, 0);
+	}
+	/* No need to set DR6 */
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+		struct kernel_bp_data *thr_kbpdata)
+{
+	int num = thr_kbpdata->num_kbps;
+
+	thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+	/* Install the user breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up; there are num_kbps of them.
+	 * User breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for.
+	 */
+	switch (thbi->num_installed) {
+	case 4:
+		set_debugreg(thbi->tdr[0], 0);
+	case 3:
+		set_debugreg(thbi->tdr[1], 1);
+	case 2:
+		set_debugreg(thbi->tdr[2], 2);
+	case 1:
+		set_debugreg(thbi->tdr[3], 3);
+	}
+	/* No need to set DR6 */
+	set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+	int num = new_kbpdata->num_kbps;
+
+	new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va,
+		struct task_struct *tsk)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE_OF(t)	TASK_SIZE
+#endif
+	return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE64	TASK_SIZE
+#endif
+	return (va >= TASK_SIZE64);
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, u8 len, u8 type)
+{
+	unsigned long temp;
+
+	temp = (len | type) & 0xf;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_SLOWDOWN;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+	int is_user;
+	struct list_head *bp_list;
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	if (thbi) {
+		is_user = 1;
+		bp_list = &thbi->thread_bps;
+		drnum = HB_NUM - 1;
+	} else {
+		is_user = 0;
+		bp_list = &kernel_bps;
+		drnum = 0;
+	}
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later.
+	 */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		dr7 |= encode_dr7(drnum, bp->len, bp->type);
+		if (++i >= HB_NUM)
+			break;
+		if (is_user)
+			--drnum;
+		else
+			++drnum;
+	}
+	return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+
+	/* If this is an execution breakpoint for the current PC address,
+	 * we should clear the task's RF so that the bp will be certain
+	 * to trigger.
+	 *
+	 * FIXME: It's not so easy to get hold of the task's PC as a linear
+	 * address!  ptrace.c does this already...
+	 */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = thbi->vdr_bps[i].address.va;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that the breakpoint
+	 * was hit so the thread's debugger will see it.
+	 */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = (unsigned long) thbi->vdr_bps[n].address.va;
+	} else if (n == 6) {
+		val = tsk->thread.vdr6;
+	} else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+	*len = (temp & 0xc) | 0x40;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints, making the
+	 * appropriate changes to each.
+	 */
+ restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		u8 len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint before trying to change it */
+		if (bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Insert the breakpoint's new settings */
+		bp->len = len;
+		bp->type = type;
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will raise an error here.
+		 */
+		if (enabled) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+					rc == 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	/* We have to hold this lock the entire time, to prevent thbi
+	 * from being deallocated out from under us.
+	 */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		/* If the breakpoint is registered then unregister it,
+		 * change it, and re-register it.  Revert to the original
+		 * address if an error occurs.
+		 */
+		if (bp->status) {
+			unsigned long old_addr = bp->address.va;
+
+			__unregister_user_hw_breakpoint(tsk, bp);
+			bp->address.va = val;
+			rc = __register_user_hw_breakpoint(tsk, bp);
+			if (rc < 0) {
+				bp->address.va = old_addr;
+				__register_user_hw_breakpoint(tsk, bp);
+			}
+		} else {
+			bp->address.va = val;
+			rc = 0;
+		}
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi = NULL;
+
+	/* The DR6 value is stored in args->err */
+#define DR6	(args->err)
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Reset the DRn bits in the virtualized register value.
+	 * The ptrace trigger routine will add in whatever is needed.
+	 */
+	current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch.
+		 */
+		switch_to_none_hw_breakpoint();
+	} else {
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+	}
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions.
+	 */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (likely(!(DR6 & (DR_TRAP0 << i))))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback.
+		 */
+		if (i < chbi->cur_kbpdata->num_kbps)
+			bp = chbi->cur_kbpdata->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp) {		/* Should always be non-NULL */
+
+			/* Set RF at execution breakpoints */
+			if (bp->type == HW_BREAKPOINT_EXECUTE)
+				args->regs->eflags |= X86_EFLAGS_RF;
+			(bp->triggered)(bp, args->regs);
+		}
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+	put_cpu_no_resched();
+
+	if (!(DR6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_STOP;
+
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	load_debug_registers();
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	disable_debug_registers();
 }
 
 void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -660,9 +660,17 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+
+	/* The DR6 value is stored in args->err */
+#define DR6	(args->err)
+
+		if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
+			if ((DR6 & ~DR_STEP) == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
+#undef DR6
+
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
 		/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,264 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef	__KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address, @len, and @type fields are highly architecture-specific.
+ * Portable drivers should not use them directly but should employ the
+ * following accessor inlines and macros instead:
+ *
+ *	To set @address, @len, and @type before registering a
+ *	breakpoint, use hw_breakpoint_kinit() or hw_breakpoint_uinit()
+ *	for kernel- and user-space breakpoints respectively.
+ *
+ *	To retrieve the values use
+ *	hw_breakpoint_get_{kaddr,uaddr,len,type}().
+ *
+ *	To initialize these fields in a static breakpoint structure,
+ *	use HW_BREAKPOINT_KINIT() or HW_BREAKPOINT_UINIT() as part
+ *	of the initializer.
+ *
+ * The general descriptions below are accurate for x86; on other
+ * architectures some of the fields might be unused or might have bits
+ * altered while a breakpoint is registered.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.
+ *
+ * @len encodes the breakpoint's extent in bytes, which is subject to
+ * certain limitations.  include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture.  Note that
+ * @address must have the alignment specified by @len.  The breakpoint
+ * will catch accesses to any byte in the range from @address to @address
+ * + (N-1), where N is the value encoded by @len.
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures.  Execute breakpoints
+ * must have @len equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * With register_user_hw_breakpoint(), @address must refer to a location
+ * in user space (@address.user).  The breakpoint will be active only
+ * while the requested task is running.  Conversely with
+ * register_kernel_hw_breakpoint(), @address must refer to a location in
+ * kernel space (@address.kernel), and the breakpoint will be active on
+ * all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception).  All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint().  If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.  (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.  Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ *	..........<do anything>............
+ *	hw_breakpoint_kinit(&my_bp, &pid_max, HW_BREAKPOINT_LEN_4,
+ *			HW_BREAKPOINT_WRITE);
+ *	my_bp.triggered = triggered;
+ *	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ *	rc = register_kernel_hw_breakpoint(&my_bp);
+ *	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ *	..........<do anything>............
+ *	unregister_kernel_hw_breakpoint(&my_bp);
+ *	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	union {
+		const void		*kernel;
+		const void __user	*user;
+		unsigned long		va;
+	}		address;
+	u8		len;
+	u8		type;
+	u8		priority;
+	u8		status;
+};
+
+/*
+ * Macros to initialize the arch-specific parts of a static breakpoint
+ * structure (mnemonic: the address, len, and type arguments occur in
+ * alpabetical order):
+ *
+ * HW_BREAKPOINT_KINIT(addr, len, type)
+ * HW_BREAKPOINT_UINIT(addr, len, type)
+ */
+
+/*
+ * Inline setter routines to initialize the arch-specific parts of
+ * a breakpoint structure:
+ */
+static void hw_breakpoint_kinit(struct hw_breakpoint *bp,
+		const void *addr, unsigned len, unsigned type);
+static void hw_breakpoint_uinit(struct hw_breakpoint *bp,
+		const void __user *addr, unsigned len, unsigned type);
+
+/*
+ * Inline accessor routines to retrieve the arch-specific parts of
+ * a breakpoint structure:
+ */
+static const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp);
+static const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture.  On i386 the
+ * possibilities are:
+ *
+ *	HW_BREAKPOINT_LEN_1
+ *	HW_BREAKPOINT_LEN_2
+ *	HW_BREAKPOINT_LEN_4
+ *	HW_BREAKPOINT_LEN_EXECUTE
+ *	HW_BREAKPOINT_RW
+ *	HW_BREAKPOINT_READ
+ *	HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable.  There also may be
+ * HW_BREAKPOINT_WRITE.  You can use #ifdef to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* __KERNEL__ */
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/debugreg.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
+	disable_debug_registers();
 
 	control_page = page_address(image->control_code_page);
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
 #include <smpboot_hooks.h>
 #include <asm/vmi.h>
 #include <asm/mtrr.h>
+#include <asm/debugreg.h>
 
 /* Set if we find a B stepping CPU */
 static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
 	local_irq_enable();
 
 	wmb();
+	load_debug_registers();
 	cpu_idle();
 }
 
@@ -1209,6 +1211,7 @@ int __cpu_disable(void)
 	fixup_irqs(map);
 	/* It's now safe to remove this processor from the online map */
 	cpu_clear(cpu, cpu_online_map);
+	disable_debug_registers();
 	return 0;
 }
 
Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,762 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines.  It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+	struct kernel_bp_data *thr_kbpdata;
+
+	/* This routine is on the hot path; it gets called for every
+	 * context switch into a task with active breakpoints.  We
+	 * must make sure that the common case executes as quickly as
+	 * possible.
+	 */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Use RCU to synchronize with external updates */
+	rcu_read_lock();
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this time.  If they are, they will modify
+	 * the other entry in kbpdata[] -- the one not pointed to
+	 * by chbi->cur_kbpdata.  So the update itself won't affect
+	 * us directly.
+	 *
+	 * However when the update is finished, an IPI will arrive
+	 * telling this CPU to change chbi->cur_kbpdata.  We need
+	 * to use a single consistent kbpdata[] entry, the present one.
+	 * So we'll copy the pointer to a local variable, thr_kbpdata,
+	 * and we must prevent the compiler from aliasing the two
+	 * pointers.  Only a compiler barrier is required, not a full
+	 * memory barrier, because everything takes place on a single CPU.
+	 */
+ restart:
+	thr_kbpdata = chbi->cur_kbpdata;
+	barrier();
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the kernel breakpoints have
+	 * changed or any user breakpoints have been registered or
+	 * unregistered, we need to handle the updates and possibly
+	 * send out some notifications.
+	 */
+	if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+		struct hw_breakpoint *bp;
+		int i;
+		int num;
+
+		thbi->gennum = thr_kbpdata->gennum;
+		arch_update_thbi(thbi, thr_kbpdata);
+		num = thr_kbpdata->num_kbps;
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		thbi->num_installed = 0;
+		i = HB_NUM;
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < num) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				++thbi->num_installed;
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+	}
+
+	/* Set the debug register */
+	arch_install_thbi(thbi);
+
+	/* Were there any kernel breakpoint changes while we were running? */
+	if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+		/* DR0-3 might now be assigned to kernel bps and we might
+		 * have messed them up.  Reload all the kernel bps and
+		 * then reload the thread bps.
+		 */
+		arch_install_chbi(chbi);
+		goto restart;
+	}
+
+	rcu_read_unlock();
+	put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = NULL;
+
+	/* This routine gets called from only two places.  In one
+	 * the caller holds the hw_breakpoint_mutex; in the other
+	 * interrupts are disabled.  In either case, no kernel
+	 * breakpoint updates can arrive while the routine runs.
+	 * So we don't need to use RCU.
+	 */
+	arch_install_none(chbi);
+	put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	/* Install both the kernel and the user breakpoints */
+	arch_install_chbi(chbi);
+	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+		switch_to_thread_hw_breakpoint(tsk);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	/* We don't need to use any sort of memory barrier.  The IPI
+	 * carried out by on_each_cpu() includes its own barriers.
+	 */
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+	synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	unsigned long flags;
+
+	/* Prevent IPIs for new kernel breakpoint updates */
+	local_irq_save(flags);
+
+	rcu_read_lock();
+	update_this_cpu(NULL);
+	rcu_read_unlock();
+
+	local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.  Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio.
+	 */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+	struct kernel_bp_data *new_kbpdata;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps.
+	 */
+	k = 0;			/* Next kernel bp to allocate */
+	u = HB_NUM - 1;		/* Next user bp to allocate */
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k <= u) {
+		if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+			--u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != cur_kbpdata->num_kbps)
+		changed = 1;
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled.
+	 */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+		cur_kbpindex ^= 1;
+		new_kbpdata = &kbpdata[cur_kbpindex];
+		new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+		new_kbpdata->num_kbps = k;
+		arch_new_kbpdata(new_kbpdata);
+		u = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (u >= k)
+				break;
+			new_kbpdata->bps[u] = bp;
+			++u;
+		}
+		rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		for (u = 0; u < k; ++u) {
+			bp = new_kbpdata->bps[u];
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+
+			/* Force an update the next time tsk runs */
+			thbi->gennum = cur_kbpdata->gennum - 2;
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE?
+	 */
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = HB_NUM - 1;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		thbi->tdr[i] = bp->address.va;
+		if (--i < 0)
+			break;
+	}
+	while (i >= 0)
+		thbi->bps[i--] = NULL;
+
+	/* Force an update the next time this task runs */
+	thbi->gennum = cur_kbpdata->gennum - 2;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	bp->status = HW_BREAKPOINT_REGISTERED;
+	list_add_tail(&bp->node, &temp_bp->node);
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(&thbi->node)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	}
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid.
+	 */
+	list_del(&bp->node);
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+	int rc = -EINVAL;
+	unsigned long len;
+
+	switch (hw_breakpoint_get_type(bp)) {
+#ifdef HW_BREAKPOINT_EXECUTE
+	case HW_BREAKPOINT_EXECUTE:
+		if (hw_breakpoint_get_len(bp) != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+	case HW_BREAKPOINT_READ:	break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case HW_BREAKPOINT_WRITE:	break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case HW_BREAKPOINT_RW:		break;
+#endif
+	default:
+		return rc;
+	}
+
+	switch (hw_breakpoint_get_len(bp)) {
+#ifdef HW_BREAKPOINT_LEN_1
+	case HW_BREAKPOINT_LEN_1:
+		len = 1;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+	case HW_BREAKPOINT_LEN_2:
+		len = 2;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+	case HW_BREAKPOINT_LEN_4:
+		len = 4;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+	case HW_BREAKPOINT_LEN_8:
+		len = 8;
+		break;
+#endif
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len.
+	 */
+	if (bp->address.va & (len - 1))
+		return rc;
+
+	/* Check that the virtual address is in the proper range */
+	if (tsk) {
+		if (!arch_check_va_in_userspace(bp->address.va, tsk))
+			return rc;
+	} else {
+		if (!arch_check_va_in_kernelspace(bp->address.va))
+			return rc;
+	}
+
+	if (bp->triggered)
+		rc = 0;
+	return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	arch_register_user_hw_breakpoint(bp, thbi);
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase.
+	 */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list and update the DR7 value */
+	remove_bp_from_list(bp, thbi, tsk);
+	arch_unregister_user_hw_breakpoint(bp, thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary.
+	 */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity.  @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list and update the DR7 value */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	arch_register_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < cur_kbpdata->num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list and update the DR7 value */
+	remove_bp_from_list(bp, NULL, NULL);
+	arch_unregister_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);

[-- Attachment #2: Type: TEXT/PLAIN, Size: 12158 bytes --]

Index: usb-2.6/bptest/Makefile
===================================================================
--- /dev/null
+++ usb-2.6/bptest/Makefile
@@ -0,0 +1 @@
+obj-m	+= bptest.o
Index: usb-2.6/bptest/bptest.c
===================================================================
--- /dev/null
+++ usb-2.6/bptest/bptest.c
@@ -0,0 +1,460 @@
+/*
+ * Test driver for hardware breakpoints.
+ *
+ * Copyright (C) 2007 Alan Stern <stern@rowland.harvard.edu>
+ */
+
+/*
+ * When this driver is loaded, it will create several attribute files
+ * under /sys/bus/platform/drivers/bptest:
+ *
+ *	 call, read, write, and bp0,..., bp3.
+ *
+ * It also allocates a 32-byte array (called "bytes") for testing data
+ * breakpoints, and it contains four do-nothing routines, r0(),..., r3(),
+ * for testing execution breakpoints.
+ *
+ * Writing to the "call" attribute causes the rN routines to be called;
+ * "echo >call N" will call rN(), where N is 0, 1, 2, or 3.  Similarly,
+ * "echo >call" will call all four routines.
+ *
+ * The byte array can be accessed through the "read" and "write"
+ * attributes.  "echo >read N" will read bytes[N], and "echo >write N V"
+ * will store V in bytes[N], where N is between 0 and 31.  There are
+ * no provision for multi-byte accesses; they shouldn't be needed for
+ * simple testing.
+ *
+ * The driver contains four hw_breakpoint structures, which can be
+ * accessed through the "bpN" attributes.  Reading the attribute file
+ * will yield the hw_breakpoint's current settings.  The settings can be
+ * altered by writing the attribute.  The format to use is:
+ *
+ *	echo >bpN priority type address [len]
+ *
+ * priority must be a number between 0 and 255.  type must be one of 'e'
+ * (execution), 'r' (read), 'w' (write), or 'b' (both read/write).
+ * address must be a number between 0 and 31; if type is 'e' then address
+ * must be between 0 and 3.  len must 1, 2, 4, or 8, but if type is 'e'
+ * then len is optional and ignored.
+ *
+ * Execution breakpoints are set on the rN routine and data breakpoints
+ * are set on bytes[N], where N is the address value.  You can unregister
+ * a breakpoint by doing "echo >bpN u", where 'u' is any non-digit.
+ *
+ * (Note: On i386 certain values are not implemented.  len cannot be set
+ * to 8 and type cannot be set to 'r'.)
+ *
+ * The driver prints lots of information to the system log as it runs.
+ * To best see things as they happen, use a VT console and set the
+ * logging level high (I use Alt-SysRq-9).
+ */
+
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <asm/hw_breakpoint.h>
+
+MODULE_AUTHOR("Alan Stern <stern@rowland.harvard.edu>");
+MODULE_DESCRIPTION("Hardware Breakpoint test driver");
+MODULE_LICENSE("GPL");
+
+
+static struct hw_breakpoint bps[4];
+
+#define NUM_BYTES	32
+static unsigned char bytes[NUM_BYTES] __attribute__((aligned(8)));
+
+/* Write n to read bytes[n] */
+static ssize_t read_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n = -1;
+
+	if (sscanf(buf, "%d", &n) < 1 || n < 0 || n >= NUM_BYTES) {
+		printk(KERN_WARNING "bptest: read: invalid index %d\n", n);
+		return -EINVAL;
+	}
+	printk(KERN_INFO "bptest: read: bytes[%d] = %d\n", n, bytes[n]);
+	return count;
+}
+static DRIVER_ATTR(read, 0200, NULL, read_store);
+
+/* Write n v to set bytes[n] = v */
+static ssize_t write_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n = -1;
+	int v;
+
+	if (sscanf(buf, "%d %d", &n, &v) < 2 || n < 0 || n >= NUM_BYTES) {
+		printk(KERN_WARNING "bptest: write: invalid index %d\n", n);
+		return -EINVAL;
+	}
+	bytes[n] = v;
+	printk(KERN_INFO "bptest: write: bytes[%d] <- %d\n", n, v);
+	return count;
+}
+static DRIVER_ATTR(write, 0200, NULL, write_store);
+
+
+/* Dummy routines for testing instruction breakpoints */
+static void r0(void)
+{
+	printk(KERN_INFO "This is r%d\n", 0);
+}
+static void r1(void)
+{
+	printk(KERN_INFO "This is r%d\n", 1);
+}
+static void r2(void)
+{
+	printk(KERN_INFO "This is r%d\n", 2);
+}
+static void r3(void)
+{
+	printk(KERN_INFO "This is r%d\n", 3);
+}
+
+static void (*rtns[])(void) = {
+	r0, r1, r2, r3
+};
+
+
+/* Write n to call routine r##n, or a blank line to call them all */
+static ssize_t call_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	int n;
+
+	if (sscanf(buf, "%d", &n) == 0) {
+		printk(KERN_INFO "bptest: call all routines\n");
+		r0();
+		r1();
+		r2();
+		r3();
+	} else if (n >= 0 && n < 4) {
+		printk(KERN_INFO "bptest: call r%d\n", n);
+		rtns[n]();
+	} else {
+		printk(KERN_WARNING "bptest: call: invalid index: %d\n", n);
+		count = -EINVAL;
+	}
+	return count;
+}
+static DRIVER_ATTR(call, 0200, NULL, call_store);
+
+
+/* Breakpoint callbacks */
+static void bptest_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	printk(KERN_INFO "Breakpoint %d triggered\n", bp - bps);
+}
+
+static void bptest_installed(struct hw_breakpoint *bp)
+{
+	printk(KERN_INFO "Breakpoint %d installed\n", bp - bps);
+}
+
+static void bptest_uninstalled(struct hw_breakpoint *bp)
+{
+	printk(KERN_INFO "Breakpoint %d uninstalled\n", bp - bps);
+}
+
+
+/* Breakpoint attribute files for testing */
+static ssize_t bp_show(int n, char *buf)
+{
+	struct hw_breakpoint *bp = &bps[n];
+	int a, len, type;
+
+	if (!bp->status)
+		return sprintf(buf, "bp%d: unregistered\n", n);
+
+	len = -1;
+	switch (hw_breakpoint_get_len(bp)) {
+#ifdef HW_BREAKPOINT_LEN_1
+	case HW_BREAKPOINT_LEN_1:	len = 1;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+	case HW_BREAKPOINT_LEN_2:	len = 2;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+	case HW_BREAKPOINT_LEN_4:	len = 4;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+	case HW_BREAKPOINT_LEN_8:	len = 8;	break;
+#endif
+	}
+
+	type = '?';
+	switch (hw_breakpoint_get_type(bp)) {
+#ifdef HW_BREAKPOINT_READ
+	case HW_BREAKPOINT_READ:	type = 'r';	break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case HW_BREAKPOINT_WRITE:	type = 'w';	break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case HW_BREAKPOINT_RW:		type = 'b';	break;
+#endif
+#ifdef HW_BREAKPOINT_EXECUTE
+	case HW_BREAKPOINT_EXECUTE:	type = 'e';	break;
+#endif
+	}
+
+	a = -1;
+	if (type == 'e') {
+		const void *addr = hw_breakpoint_get_kaddr(bp);
+
+		if (addr == r0)
+			a = 0;
+		else if (addr == r1)
+			a = 1;
+		else if (addr == r2)
+			a = 2;
+		else if (addr == r3)
+			a = 3;
+	} else {
+		const unsigned char *p = hw_breakpoint_get_kaddr(bp);
+
+		if (p >= bytes && p < bytes + NUM_BYTES)
+			a = p - bytes;
+	}
+
+	return sprintf(buf, "bp%d: %d %c %d %d [%sinstalled]\n",
+			n, bp->priority, type, a, len,
+			(bp->status < HW_BREAKPOINT_INSTALLED ? "not " : ""));
+}
+
+static ssize_t bp_store(int n, const char *buf, size_t count)
+{
+	struct hw_breakpoint *bp = &bps[n];
+	int prio, a, alen;
+	char atype;
+	unsigned len, type;
+	int i;
+
+	if (count <= 1) {
+		printk(KERN_INFO "bptest: bp%d: format:  priority type "
+				"address len\n", n);
+		printk(KERN_INFO "  type = r, w, b, or e; address = 0 - 31; "
+				"len = 1, 2, 4, or 8\n");
+		printk(KERN_INFO "  Write any non-digit to unregister\n");
+		return count;
+	}
+
+	unregister_kernel_hw_breakpoint(bp);
+	printk(KERN_INFO "bptest: bp%d unregistered\n", n);
+
+	alen = -1;
+	i = sscanf(buf, "%d %c %d %d", &prio, &atype, &a, &alen);
+	if (i == 0)
+		return count;
+	if (i < 3) {
+		printk(KERN_WARNING "bptest: bp%d: too few fields\n", n);
+		return -EINVAL;
+	}
+
+	bp->priority = prio;
+	switch (atype) {
+#ifdef HW_BREAKPOINT_EXECUTE
+	case 'e':
+		type = HW_BREAKPOINT_EXECUTE;
+		len = HW_BREAKPOINT_LEN_EXECUTE;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+	case 'r':
+		type = HW_BREAKPOINT_READ;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case 'w':
+		type = HW_BREAKPOINT_WRITE;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case 'b':
+		type = HW_BREAKPOINT_RW;
+		break;
+#endif
+	default:
+		printk(KERN_WARNING "bptest: bp%d: invalid type %c\n",
+				n, atype);
+		return -EINVAL;
+	}
+
+	if (a < 0 || a >= NUM_BYTES || (a >= 4 && atype == 'e')) {
+		printk(KERN_WARNING "bptest: bp%d: invalid address %d\n",
+				n, a);
+		return -EINVAL;
+	}
+	if (atype == 'e')
+		hw_breakpoint_kinit(bp, rtns[a], len, type);
+	else {
+		switch (alen) {
+#ifdef HW_BREAKPOINT_LEN_1
+		case 1:		len = HW_BREAKPOINT_LEN_1;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+		case 2:		len = HW_BREAKPOINT_LEN_2;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+		case 4:		len = HW_BREAKPOINT_LEN_4;	break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+		case 8:		len = HW_BREAKPOINT_LEN_8;	break;
+#endif
+		default:
+			printk(KERN_WARNING "bptest: bp%d: invalid len %d\n",
+					n, alen);
+			return -EINVAL;
+			break;
+		}
+		hw_breakpoint_kinit(bp, &bytes[a], len, type);
+	}
+
+	bp->triggered = bptest_triggered;
+	bp->installed = bptest_installed;
+	bp->uninstalled = bptest_uninstalled;
+
+	i = register_kernel_hw_breakpoint(bp);
+	if (i < 0) {
+		printk(KERN_WARNING "bptest: bp%d: failed to register %d\n",
+				n, i);
+		count = i;
+	} else
+		printk(KERN_INFO "bptest: bp%d registered: %d\n", n, i);
+	return count;
+}
+
+
+static ssize_t bp0_show(struct device_driver *d, char *buf)
+{
+	return bp_show(0, buf);
+}
+static ssize_t bp0_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(0, buf, count);
+}
+static DRIVER_ATTR(bp0, 0600, bp0_show, bp0_store);
+
+static ssize_t bp1_show(struct device_driver *d, char *buf)
+{
+	return bp_show(1, buf);
+}
+static ssize_t bp1_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(1, buf, count);
+}
+static DRIVER_ATTR(bp1, 0600, bp1_show, bp1_store);
+
+static ssize_t bp2_show(struct device_driver *d, char *buf)
+{
+	return bp_show(2, buf);
+}
+static ssize_t bp2_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(2, buf, count);
+}
+static DRIVER_ATTR(bp2, 0600, bp2_show, bp2_store);
+
+static ssize_t bp3_show(struct device_driver *d, char *buf)
+{
+	return bp_show(3, buf);
+}
+static ssize_t bp3_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	return bp_store(3, buf, count);
+}
+static DRIVER_ATTR(bp3, 0600, bp3_show, bp3_store);
+
+
+static int bptest_probe(struct platform_device *pdev)
+{
+	return -ENODEV;
+}
+
+static int bptest_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static struct platform_driver bptest_driver = {
+	.probe = bptest_probe,
+	.remove = bptest_remove,
+	.driver = {
+		.name = "bptest",
+		.owner = THIS_MODULE,
+	}
+};
+
+
+static struct driver_attribute *(bptest_group[]) = {
+	&driver_attr_bp0,
+	&driver_attr_bp1,
+	&driver_attr_bp2,
+	&driver_attr_bp3,
+	&driver_attr_call,
+	&driver_attr_read,
+	&driver_attr_write,
+	NULL
+};
+
+static int add_files(void)
+{
+	int rc = 0;
+	struct driver_attribute **g;
+
+	for (g = bptest_group; *g; ++g) {
+		rc = driver_create_file(&bptest_driver.driver, *g);
+		if (rc)
+			break;
+	}
+	return rc;
+}
+
+static void remove_files(void)
+{
+	struct driver_attribute **g;
+
+	for (g = bptest_group; *g; ++g)
+		driver_remove_file(&bptest_driver.driver, *g);
+}
+
+static int __init bptest_init(void)
+{
+	int rc;
+
+	rc = platform_driver_register(&bptest_driver);
+	if (rc) {
+		printk(KERN_ERR "Failed to register bptest driver: %d\n", rc);
+		return rc;
+	}
+	rc = add_files();
+	if (rc) {
+		remove_files();
+		platform_driver_unregister(&bptest_driver);
+		return rc;
+	}
+	printk("bptest loaded\n");
+	return 0;
+}
+
+static void __exit bptest_exit(void)
+{
+	int n;
+
+	remove_files();
+	for (n = 0; n < 4; ++n)
+		unregister_kernel_hw_breakpoint(&bps[n]);
+	platform_driver_unregister(&bptest_driver);
+	printk("bptest unloaded\n");
+}
+
+module_init(bptest_init);
+module_exit(bptest_exit);

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-19 20:35                                                             ` Alan Stern
@ 2007-06-25 10:52                                                               ` Roland McGrath
  2007-06-25 15:36                                                                 ` Alan Stern
  2007-06-25 11:32                                                               ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-06-25 10:52 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> "A waste to store one"?  Waste of what?  It isn't a waste of space; the 
> space would otherwise be unused.  Waste of an instruction, perhaps.

Yes.  

> It is now possible for an implementation to store things in a 
> machine-dependent fashion; I have added accessor routines as you 
> suggested.  But I also left the fields as they were; the documentation 
> mentions that they won't necessarily contain any particular values.

People usually read the documentation after the fields named like they can
guess what they contain have values that confuse them, not before.

> You might want to examine the check in validate_settings() for address 
> alignment; it might not be valid if other values get stored in the 
> low-order bits of the address.  This is a tricky point; it's not safe 
> to mix bits around unless you know that the data values are correct, 
> but in validate_settings() you don't yet know that.

This is why I didn't bring up encoded addresses earlier on. :-)  

These kinds of issues are why I prefer unambiguously opaque arch-specific
encodings.  validate_settings is indeed wrong for the natural ppc encoding.

The values must be set by a call that can return an error.  That means you
can't really have a static initializer macro, unless it's intended to mean
"unspecified garbage if not used exactly right".  I favor just going back
to passing three more args to register_kernel_hw_breakpoint.

> Tests show that my CPU does not clear DR_STEP when a data breakpoint is
> hit.  Conversely, the DR_TRAPn bits are cleared even when a single-step 
> exception occurs.

Ok, this is pretty consistent with what the newest Intel manuals say.

> If you're interested, I can send you the code I used to do this testing
> so you can try it on your machine.

Ok.

> > I still think it's the proper thing to make it conditional, not always
> > built in.  But it's a pedantic point.
> 
> We have three things to consider: ptrace, utrace, and hw-breakpoint.  
> Ultimately hw-breakpoint should become part of utrace; we might not
> want to bother with a standalone version.

It is not hard to make it a separate option, so there is no reason not to.

> Furthermore, hw-breakpoint takes over the ptrace's mechanism for
> breakpoint handling.  If we want to allow a configuration where ptrace
> is present and hw-breakpoint isn't, then I would have to add an
> alternate implementation containing only support for the legacy
> interface.

I was not suggesting that.  CONFIG_PTRACE would require HW_BREAKPOINT on
machines where arch ptrace code uses it.

> I made a few other changes to do_debug.  For instance, it no longer 
> checks whether notify_die() returns NOTIFY_STOP.  That check was a 
> mistake to begin with; NOTIFY_STOP merely means to cut the notifier 
> chain short -- it doesn't mean that the debug exception can be ignored.  

This is incorrect.  The usage of notify_die in all other cases, at least of
machine exceptions on x86, is to test for == NOTIFY_STOP and when true
short-circuit the normal effect of the exception (signal, oops).  The
notifiers should return NOTIFY_STOP if they consumed the exception wholly.
If none uses NOTIFY_STOP, then the normal user signal should happen.

> Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are 
> set in vdr6; this is now the appropriate condition.

>From what you've said, DR_STEP will remain set on a later debug exception.
So if a non-ptrace hw breakpoint consumed the exception and left no
DR_TRAPn bits set, the thread would generate a second SIGTRAP from the
prior single-step.  Currently userland expects to have to clear DR_STEP in
dr6 via ptrace itself, but does not expect it can get a duplicate SIGTRAP
if it doesn't.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-19 20:35                                                             ` Alan Stern
  2007-06-25 10:52                                                               ` Roland McGrath
@ 2007-06-25 11:32                                                               ` Roland McGrath
  2007-06-25 15:37                                                                 ` Alan Stern
  2007-06-25 20:51                                                                 ` Alan Stern
  1 sibling, 2 replies; 70+ messages in thread
From: Roland McGrath @ 2007-06-25 11:32 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

I added this on top of your patch to make it compile (and look a little nicer).
With that, bptest worked nicely.

---
 arch/i386/kernel/kprobes.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Index: b/arch/i386/kernel/kprobes.c
===================================================================
--- a/arch/i386/kernel/kprobes.c
+++ b/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include <asm/cacheflush.h>
 #include <asm/desc.h>
 #include <asm/uaccess.h>
+#include <asm/debugreg.h>
 
 void jprobe_return_end(void);
 
@@ -660,16 +661,16 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-
-	/* The DR6 value is stored in args->err */
-#define DR6	(args->err)
-
-		if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
-			if ((DR6 & ~DR_STEP) == 0)
-				ret = NOTIFY_STOP;
-		}
+		/*
+		 * The %db6 value is stored in args->err.
+		 * If DR_STEP is the only bit set and it's ours,
+		 * we should eat this exception.
+		 */
+		if ((args->err & DR_STEP) &&
+		    post_kprobe_handler(args->regs) &&
+		    (args->err & ~DR_STEP) == 0)
+			ret = NOTIFY_STOP;
 		break;
-#undef DR6
 
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-25 10:52                                                               ` Roland McGrath
@ 2007-06-25 15:36                                                                 ` Alan Stern
  2007-06-26 20:49                                                                   ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-06-25 15:36 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Mon, 25 Jun 2007, Roland McGrath wrote:

> > "A waste to store one"?  Waste of what?  It isn't a waste of space; the 
> > space would otherwise be unused.  Waste of an instruction, perhaps.
> 
> Yes.  

Of course, calling register_kernel_hw_breakpoint() with three extra
arguments is a waste of an instruction also, if one of those arguments 
isn't used.

And yet it's not clear that either of these really is a waste.  Suppose
somebody ports code from x86 to PPC64 and leaves a breakpoint length
set to HW_BREAKPOINT_LEN_4.  Clearly we would want to return an error.  
This means that the length value _has_ to be tested, even if it won't
be used for anything.  And this means the length _has_ to be passed
along somehow, either as an argument or as a field value.

> > You might want to examine the check in validate_settings() for address 
> > alignment; it might not be valid if other values get stored in the 
> > low-order bits of the address.  This is a tricky point; it's not safe 
> > to mix bits around unless you know that the data values are correct, 
> > but in validate_settings() you don't yet know that.
> 
> This is why I didn't bring up encoded addresses earlier on. :-)  
> 
> These kinds of issues are why I prefer unambiguously opaque arch-specific
> encodings.  validate_settings is indeed wrong for the natural ppc encoding.
> 
> The values must be set by a call that can return an error.  That means you
> can't really have a static initializer macro, unless it's intended to mean
> "unspecified garbage if not used exactly right".  I favor just going back
> to passing three more args to register_kernel_hw_breakpoint.

All right, I'll change it.  And I'll encapsulate those fields.  I still 
think it will accomplish nothing more than hiding some implementation 
details which don't really need to be hidden.


> > Tests show that my CPU does not clear DR_STEP when a data breakpoint is
> > hit.  Conversely, the DR_TRAPn bits are cleared even when a single-step 
> > exception occurs.
> 
> Ok, this is pretty consistent with what the newest Intel manuals say.
> 
> > If you're interested, I can send you the code I used to do this testing
> > so you can try it on your machine.
> 
> Ok.

It's below.  The patch logs the value of DR6 when each debug interrupt 
occurs, and it adds another sysfs attribute to the bptest driver.  The 
attribute is named "test", and it contains the value that the IRQ 
handler will write back to DR6.  Combine this with the Alt-SysRq-P 
change already submitted, and you can get a clear view of what's going 
on.


> > We have three things to consider: ptrace, utrace, and hw-breakpoint.  
> > Ultimately hw-breakpoint should become part of utrace; we might not
> > want to bother with a standalone version.
> 
> It is not hard to make it a separate option, so there is no reason not to.
> 
> > Furthermore, hw-breakpoint takes over the ptrace's mechanism for
> > breakpoint handling.  If we want to allow a configuration where ptrace
> > is present and hw-breakpoint isn't, then I would have to add an
> > alternate implementation containing only support for the legacy
> > interface.
> 
> I was not suggesting that.  CONFIG_PTRACE would require HW_BREAKPOINT on
> machines where arch ptrace code uses it.

I see.  So I could add a CONFIG_HW_BREAKPOINT option and make 
CONFIG_PTRACE depend on it.  That will be simple enough.

Do you think it would make sense to allow utrace without hw-breakpoint?


> > I made a few other changes to do_debug.  For instance, it no longer 
> > checks whether notify_die() returns NOTIFY_STOP.  That check was a 
> > mistake to begin with; NOTIFY_STOP merely means to cut the notifier 
> > chain short -- it doesn't mean that the debug exception can be ignored.  
> 
> This is incorrect.  The usage of notify_die in all other cases, at least of
> machine exceptions on x86, is to test for == NOTIFY_STOP and when true
> short-circuit the normal effect of the exception (signal, oops).  The
> notifiers should return NOTIFY_STOP if they consumed the exception wholly.
> If none uses NOTIFY_STOP, then the normal user signal should happen.

All right, I'll fix that back up.

> > Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are 
> > set in vdr6; this is now the appropriate condition.
> 
> From what you've said, DR_STEP will remain set on a later debug exception.
> So if a non-ptrace hw breakpoint consumed the exception and left no
> DR_TRAPn bits set, the thread would generate a second SIGTRAP from the
> prior single-step.  Currently userland expects to have to clear DR_STEP in
> dr6 via ptrace itself, but does not expect it can get a duplicate SIGTRAP
> if it doesn't.

No, because do_debug always writes a 0 to DR6 after reading it;  
consequently DR_STEP does not remain set on later exceptions.  Unless
we do something like this we would never know whether we entered the
handler because of a single-step exception or not.

But the same effect could occur because of a bogus debug exception 
caused by lazy DR7 switching.  I'll have to add back in code to detect 
that case.

Alan Stern



Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -802,13 +802,17 @@ fastcall void __kprobes do_int3(struct p
  * find every occurrence of the TF bit that could be saved away even
  * by user code)
  */
+unsigned long dr6test;
+EXPORT_SYMBOL(dr6test);
+
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
 	struct task_struct *tsk = current;
 	unsigned long dr6;
 
 	get_debugreg(dr6, 6);
-	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
+	printk(KERN_INFO "dr6 = %08lx\n", dr6);
+	set_debugreg(dr6test, 6);	/* DR6 may or may not be cleared by the CPU */
 
 	/* Store the virtualized DR6 value */
 	tsk->thread.vdr6 = dr6;
Index: usb-2.6/bptest/bptest.c
===================================================================
--- usb-2.6.orig/bptest/bptest.c
+++ usb-2.6/bptest/bptest.c
@@ -58,6 +58,22 @@ MODULE_AUTHOR("Alan Stern <stern@rowland
 MODULE_DESCRIPTION("Hardware Breakpoint test driver");
 MODULE_LICENSE("GPL");
 
+extern unsigned long dr6test;
+
+static ssize_t test_store(struct device_driver *d, const char *buf,
+		size_t count)
+{
+	if (sscanf(buf, "%lx", &dr6test) <= 0)
+		return -EIO;
+	return count;
+}
+
+static ssize_t test_show(struct device_driver *d, char *buf)
+{
+	return sprintf(buf, "dr6test: %08lx\n", dr6test);
+}
+static DRIVER_ATTR(test, 0600, test_show, test_store);
+
 
 static struct hw_breakpoint bps[4];
 
@@ -402,6 +418,7 @@ static struct driver_attribute *(bptest_
 	&driver_attr_call,
 	&driver_attr_read,
 	&driver_attr_write,
+	&driver_attr_test,
 	NULL
 };
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-25 11:32                                                               ` Roland McGrath
@ 2007-06-25 15:37                                                                 ` Alan Stern
  2007-06-25 20:51                                                                 ` Alan Stern
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-06-25 15:37 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Mon, 25 Jun 2007, Roland McGrath wrote:

> I added this on top of your patch to make it compile (and look a little nicer).
> With that, bptest worked nicely.

I'll merge this with the rest of the patch.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-25 11:32                                                               ` Roland McGrath
  2007-06-25 15:37                                                                 ` Alan Stern
@ 2007-06-25 20:51                                                                 ` Alan Stern
  2007-06-26 18:17                                                                   ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-06-25 20:51 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

Roland:

Here's the next iteration.  The arch-specific parts are now completely 
encapsulated.  validate_settings is in a form which should be workable 
on all architectures.  And the address, length, and type are passed as 
arguments to register_{kernel,user}_hw_breakpoint().

I changed the Kprobes single-step routine along the lines you 
suggested, but added a little extra.  See what you think.

I haven't tried to modify Kconfig at all.  To do it properly would
require making ptrace configurable, which is not something I want to
tackle at the moment.

The test for early termination of the exception handler is now back the
way it was.  However I didn't change the test for deciding whether to 
send a SIGTRAP.  Under the current circumstances I don't see how it 
could ever be wrong.  (On the other hand, the code will end up calling 
send_sigtrap() twice when a ptrace exception occurs: once in the ptrace 
trigger routine and once in do_debug.  That won't matter will it?  I 
would expect send_sigtrap() to be idempotent.)

Are you going to the Ottawa Linux Symposium?

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,49 @@
+#ifndef	_I386_HW_BREAKPOINT_H
+#define	_I386_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#define	__ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+	unsigned long	address;
+	u8		len;
+	u8		type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+	return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+		struct hw_breakpoint *bp)
+{
+	return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+	return bp->info.len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+	return bp->info.type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1		0x40
+#define HW_BREAKPOINT_LEN_2		0x44
+#define HW_BREAKPOINT_LEN_4		0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE	0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* __KERNEL__ */
+#endif	/* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
+#include <asm/debugreg.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
 		tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
-	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
 	/*
 	 * Forget coprocessor state..
 	 */
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl
 
 	savesegment(gs,p->thread.gs);
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
 	tsk = current;
+	err = -ENOMEM;
+	if (unlikely(tsk->thread.hw_breakpoint_info)) {
+		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+			goto out;
+	}
+
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
 						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
+		if (!p->thread.io_bitmap_ptr)
+			goto out;
 		set_tsk_thread_flag(p, TIF_IO_BITMAP);
 	}
 
@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
 
 	err = 0;
  out:
-	if (err && p->thread.io_bitmap_ptr) {
+	if (err) {
+		flush_thread_hw_breakpoint(p);
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
 	}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
  */
 void dump_thread(struct pt_regs * regs, struct user * dump)
 {
-	int i;
+	struct task_struct *tsk = current;
 
 /* changed the size calculations - should hopefully work better. lbt */
 	dump->magic = CMAGIC;
 	dump->start_code = 0;
 	dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
-	dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
-	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+	dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+	dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	for (i = 0; i < 8; i++)
-		dump->u_debugreg[i] = current->thread.debugreg[i];  
+
+	dump_thread_hw_breakpoint(tsk, dump->u_debugreg);
 
 	if (dump->start_stack < TASK_SIZE)
 		dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
-	}
-
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * Now maybe handle debug registers and/or IO bitmaps
+	 * Now maybe handle IO bitmaps
 	 */
 	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
 	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t
 
 	x86_write_percpu(current_task, next_p);
 
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
+
 	return prev_p;
 }
 
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,44 @@ fastcall void __kprobes do_int3(struct p
  */
 fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
 {
-	unsigned int condition;
 	struct task_struct *tsk = current;
+	unsigned long dr6;
 
-	get_debugreg(condition, 6);
+	get_debugreg(dr6, 6);
+	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
 
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
-					SIGTRAP) == NOTIFY_STOP)
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 = dr6;
+
+	if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
+			SIGTRAP) == NOTIFY_STOP)
 		return;
+
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg[7])
-			goto clear_dr7;
+	if (regs->eflags & VM_MASK) {
+		handle_vm86_trap((struct kernel_vm86_regs *) regs,
+				error_code, 1);
+		return;
 	}
 
-	if (regs->eflags & VM_MASK)
-		goto debug_vm86;
-
-	/* Save debug status register where ptrace can see it */
-	tsk->thread.debugreg[6] = condition;
-
 	/*
-	 * Single-stepping through TF: make sure we ignore any events in
-	 * kernel space (but re-enable TF when returning to user mode).
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
 	 */
-	if (condition & DR_STEP) {
-		/*
-		 * We already checked v86 mode above, so we can
-		 * check for kernel mode by just checking the CPL
-		 * of CS.
-		 */
-		if (!user_mode(regs))
-			goto clear_TF_reenable;
+	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+		tsk->thread.vdr6 &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~X86_EFLAGS_TF;
 	}
 
-	/* Ok, finally something we can handle */
-	send_sigtrap(tsk, regs, error_code);
-
-	/* Disable additional traps. They'll be re-enabled when
-	 * the signal is delivered.
-	 */
-clear_dr7:
-	set_debugreg(0, 7);
-	return;
-
-debug_vm86:
-	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
-	return;
+	if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+		send_sigtrap(tsk, regs, error_code);
 }
 
 /*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -61,4 +63,32 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+	set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif	/* __KERNEL__ */
+
 #endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
 	unsigned long	esp;
 	unsigned long	fs;
 	unsigned long	gs;
-/* Hardware debugging registers */
-	unsigned long	debugreg[8];  /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,653 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+	How to know whether RF should be cleared when setting a user
+	execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	/* utrace support */
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	unsigned long		tdr[HB_NUM];	/*  and their addresses */
+	int			num_installed;	/* Number of installed bps */
+	unsigned		gennum;		/* update-generation number */
+
+	/* Only the portions below are arch-specific */
+
+	/* ptrace support -- Note that vdr6 is stored directly in the
+	 * thread_struct so that it is always available.
+	 */
+	unsigned long		vdr7;			/* Virtualized DR7 */
+	struct hw_breakpoint	vdr_bps[HB_NUM];	/* Breakpoints
+			representing virtualized debug registers 0 - 3 */
+	unsigned long		tdr7;		/* Thread's DR7 value */
+	unsigned long		tkdr7;		/* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+	unsigned		gennum;		/* Generation number */
+	int			num_kbps;	/* Number of kernel bps */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoints */
+
+	/* Only the portions below are arch-specific */
+	unsigned long		mkdr7;		/* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct kernel_bp_data	*cur_kbpdata;	/* Current kbpdata[] entry */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data	kbpdata[2];	/* Old and new settings */
+static int			cur_kbpindex;	/* Alternates 0, 1, ... */
+static struct kernel_bp_data	*cur_kbpdata = &kbpdata[0];
+			/* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long		kdr7;		/* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps.  Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1).  The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long	kdr7_masks[HB_NUM + 1] = {
+	0x00000000,
+	0x000f0003,	/* LEN0, R/W0, G0, L0 */
+	0x00ff000f,	/* Same for 0,1 */
+	0x0fff003f,	/* Same for 0,1,2 */
+	0xffff00ff	/* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+	struct hw_breakpoint **bps;
+
+	/* Don't allow debug exceptions while we update the registers */
+	set_debugreg(0, 7);
+	chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+	/* Kernel breakpoints are stored starting in DR0 and going up */
+	bps = chbi->cur_kbpdata->bps;
+	switch (chbi->cur_kbpdata->num_kbps) {
+	case 4:
+		set_debugreg(bps[3]->info.address, 3);
+	case 3:
+		set_debugreg(bps[2]->info.address, 2);
+	case 2:
+		set_debugreg(bps[1]->info.address, 1);
+	case 1:
+		set_debugreg(bps[0]->info.address, 0);
+	}
+	/* No need to set DR6 */
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+		struct kernel_bp_data *thr_kbpdata)
+{
+	int num = thr_kbpdata->num_kbps;
+
+	thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+	/* Install the user breakpoints.  Kernel breakpoints are stored
+	 * starting in DR0 and going up; there are num_kbps of them.
+	 * User breakpoints are stored starting in DR3 and going down,
+	 * as many as we have room for.
+	 */
+	switch (thbi->num_installed) {
+	case 4:
+		set_debugreg(thbi->tdr[0], 0);
+	case 3:
+		set_debugreg(thbi->tdr[1], 1);
+	case 2:
+		set_debugreg(thbi->tdr[2], 2);
+	case 1:
+		set_debugreg(thbi->tdr[3], 3);
+	}
+	/* No need to set DR6 */
+	set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+	set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+	int num = new_kbpdata->num_kbps;
+
+	new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Store a thread breakpoint array entry's address
+ */
+static void arch_store_thread_bp_array(struct thread_hw_breakpoint *thbi,
+		struct hw_breakpoint *bp, int i)
+{
+	thbi->tdr[i] = bp->info.address;
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va,
+		struct task_struct *tsk)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE_OF(t)	TASK_SIZE
+#endif
+	return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef	CONFIG_X86_64
+#define	TASK_SIZE64	TASK_SIZE
+#endif
+	return (va >= TASK_SIZE64);
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static void arch_store_info(struct hw_breakpoint *bp,
+		unsigned long address, unsigned len, unsigned type)
+{
+	bp->info.address = address;
+	bp->info.len = len;
+	bp->info.type = type;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, unsigned len, unsigned type)
+{
+	unsigned long temp;
+
+	temp = (len | type) & 0xf;
+	temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+	temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+				DR_GLOBAL_SLOWDOWN;
+	return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+	int is_user;
+	struct list_head *bp_list;
+	struct hw_breakpoint *bp;
+	int i;
+	int drnum;
+	unsigned long dr7;
+
+	if (thbi) {
+		is_user = 1;
+		bp_list = &thbi->thread_bps;
+		drnum = HB_NUM - 1;
+	} else {
+		is_user = 0;
+		bp_list = &kernel_bps;
+		drnum = 0;
+	}
+
+	/* Kernel bps are assigned from DR0 on up, and user bps are assigned
+	 * from DR3 on down.  Accumulate all 4 bps; the kernel DR7 mask will
+	 * select the appropriate bits later.
+	 */
+	dr7 = 0;
+	i = 0;
+	list_for_each_entry(bp, bp_list, node) {
+
+		/* Get the debug register number and accumulate the bits */
+		dr7 |= encode_dr7(drnum, bp->info.len, bp->info.type);
+		if (++i >= HB_NUM)
+			break;
+		if (is_user)
+			--drnum;
+		else
+			++drnum;
+	}
+	return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+
+	/* If this is an execution breakpoint for the current PC address,
+	 * we should clear the task's RF so that the bp will be certain
+	 * to trigger.
+	 *
+	 * FIXME: It's not so easy to get hold of the task's PC as a linear
+	 * address!  ptrace.c does this already...
+	 */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+	thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(
+		struct hw_breakpoint *bp)
+{
+	kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	memset(u_debugreg, 0, sizeof u_debugreg);
+	if (thbi) {
+		for (i = 0; i < HB_NUM; ++i)
+			u_debugreg[i] = thbi->vdr_bps[i].info.address;
+		u_debugreg[7] = thbi->vdr7;
+	}
+	u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp,
+		unsigned long address, unsigned len, unsigned type);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	int i;
+
+	/* Store in the virtual DR6 register the fact that the breakpoint
+	 * was hit so the thread's debugger will see it.
+	 */
+	if (thbi) {
+		i = bp - thbi->vdr_bps;
+		tsk->thread.vdr6 |= (DR_TRAP0 << i);
+		send_sigtrap(tsk, regs, 0);
+	}
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+	struct thread_hw_breakpoint *thbi;
+	unsigned long val = 0;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	thbi = tsk->thread.hw_breakpoint_info;
+	if (n < HB_NUM) {
+		if (thbi)
+			val = thbi->vdr_bps[n].info.address;
+	} else if (n == 6) {
+		val = tsk->thread.vdr6;
+	} else if (n == 7) {
+		if (thbi)
+			val = thbi->vdr7;
+	}
+	mutex_unlock(&hw_breakpoint_mutex);
+	return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7.  Return the "enabled" status.
+ */
+static int decode_dr7(unsigned long dr7, int bpnum, unsigned *len,
+		unsigned *type)
+{
+	int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+	*len = (temp & 0xc) | 0x40;
+	*type = (temp & 0x3) | 0x80;
+	return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+		struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+	struct hw_breakpoint *bp;
+	int i;
+	int rc = 0;
+	unsigned long old_dr7 = thbi->vdr7;
+
+	data &= ~DR_CONTROL_RESERVED;
+
+	/* Loop through all the hardware breakpoints, making the
+	 * appropriate changes to each.
+	 */
+ restore_settings:
+	thbi->vdr7 = data;
+	bp = &thbi->vdr_bps[0];
+	for (i = 0; i < HB_NUM; (++i, ++bp)) {
+		int enabled;
+		unsigned len, type;
+
+		enabled = decode_dr7(data, i, &len, &type);
+
+		/* Unregister the breakpoint before trying to change it */
+		if (bp->status)
+			__unregister_user_hw_breakpoint(tsk, bp);
+
+		/* Now register the breakpoint if it should be enabled.
+		 * New invalid entries will raise an error here.
+		 */
+		if (enabled) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			if (rc == 0 && __register_user_hw_breakpoint(tsk, bp,
+					bp->info.address, len, type) < 0)
+				break;
+		}
+	}
+
+	/* If anything above failed, restore the original settings */
+	if (i < HB_NUM) {
+		rc = -EIO;
+		data = old_dr7;
+		goto restore_settings;
+	}
+	return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	/* We have to hold this lock the entire time, to prevent thbi
+	 * from being deallocated out from under us.
+	 */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* There are no DR4 or DR5 registers */
+	if (n == 4 || n == 5)
+		;
+
+	/* Writes to DR6 modify the virtualized value */
+	else if (n == 6) {
+		tsk->thread.vdr6 = val;
+		rc = 0;
+	}
+
+	else if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+
+	/* Writes to DR0 - DR3 change a breakpoint address */
+	else if (n < HB_NUM) {
+		struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+		/* If the breakpoint is registered then unregister it,
+		 * change it, and re-register it.  Revert to the original
+		 * address if an error occurs.
+		 */
+		if (bp->status) {
+			unsigned long old_addr = bp->info.address;
+
+			__unregister_user_hw_breakpoint(tsk, bp);
+			rc = __register_user_hw_breakpoint(tsk, bp,
+					val, bp->info.len, bp->info.type);
+			if (rc < 0) {
+				__register_user_hw_breakpoint(tsk, bp,
+						old_addr,
+						bp->info.len, bp->info.type);
+			}
+		} else {
+			bp->info.address = val;
+			rc = 0;
+		}
+	}
+
+	/* All that's left is DR7 */
+	else
+		rc = ptrace_write_dr7(tsk, thbi, val);
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+	struct cpu_hw_breakpoint *chbi;
+	int i;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi = NULL;
+
+	/* The DR6 value is stored in args->err */
+#define DR6	(args->err)
+
+	if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_DONE;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Reset the DRn bits in the virtualized register value.
+	 * The ptrace trigger routine will add in whatever is needed.
+	 */
+	current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch.
+		 */
+		switch_to_none_hw_breakpoint();
+	} else {
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+	}
+
+	/* Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions.
+	 */
+	set_debugreg(0, 7);
+
+	/* Handle all the breakpoints that were triggered */
+	for (i = 0; i < HB_NUM; ++i) {
+		if (likely(!(DR6 & (DR_TRAP0 << i))))
+			continue;
+
+		/* Find the corresponding hw_breakpoint structure and
+		 * invoke its triggered callback.
+		 */
+		if (i < chbi->cur_kbpdata->num_kbps)
+			bp = chbi->cur_kbpdata->bps[i];
+		else if (thbi)
+			bp = thbi->bps[i];
+		else		/* False alarm due to lazy DR switching */
+			continue;
+		if (bp) {		/* Should always be non-NULL */
+
+			/* Set RF at execution breakpoints */
+			if (bp->info.type == HW_BREAKPOINT_EXECUTE)
+				args->regs->eflags |= X86_EFLAGS_RF;
+			(bp->triggered)(bp, args->regs);
+		}
+	}
+
+	/* Re-enable the breakpoints */
+	set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+	put_cpu_no_resched();
+
+	if (!(DR6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+		return NOTIFY_STOP;
+
+	return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DEBUG)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+	load_debug_registers();
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
 		tmp = 0;  /* Default return condition */
 		if(addr < FRAME_SIZE*sizeof(long))
 			tmp = getreg(child, addr);
-		if(addr >= (long) &dummy->u_debugreg[0] &&
-		   addr <= (long) &dummy->u_debugreg[7]){
+		else if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
 			addr -= (long) &dummy->u_debugreg[0];
 			addr = addr >> 2;
-			tmp = child->thread.debugreg[addr];
+			tmp = thread_get_debugreg(child, addr);
 		}
 		ret = put_user(tmp, datap);
 		break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
 		   have to be selective about what portions we allow someone
 		   to modify. */
 
-		  ret = -EIO;
-		  if(addr >= (long) &dummy->u_debugreg[0] &&
-		     addr <= (long) &dummy->u_debugreg[7]){
-
-			  if(addr == (long) &dummy->u_debugreg[4]) break;
-			  if(addr == (long) &dummy->u_debugreg[5]) break;
-			  if(addr < (long) &dummy->u_debugreg[4] &&
-			     ((unsigned long) data) >= TASK_SIZE-3) break;
-			  
-			  /* Sanity-check data. Take one half-byte at once with
-			   * check = (val >> (16 + 4*i)) & 0xf. It contains the
-			   * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
-			   * 2 and 3 are LENi. Given a list of invalid values,
-			   * we do mask |= 1 << invalid_value, so that
-			   * (mask >> check) & 1 is a correct test for invalid
-			   * values.
-			   *
-			   * R/Wi contains the type of the breakpoint /
-			   * watchpoint, LENi contains the length of the watched
-			   * data in the watchpoint case.
-			   *
-			   * The invalid values are:
-			   * - LENi == 0x10 (undefined), so mask |= 0x0f00.
-			   * - R/Wi == 0x10 (break on I/O reads or writes), so
-			   *   mask |= 0x4444.
-			   * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
-			   *   0x1110.
-			   *
-			   * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
-			   *
-			   * See the Intel Manual "System Programming Guide",
-			   * 15.2.4
-			   *
-			   * Note that LENi == 0x10 is defined on x86_64 in long
-			   * mode (i.e. even for 32-bit userspace software, but
-			   * 64-bit kernel), so the x86_64 mask value is 0x5454.
-			   * See the AMD manual no. 24593 (AMD64 System
-			   * Programming)*/
-
-			  if(addr == (long) &dummy->u_debugreg[7]) {
-				  data &= ~DR_CONTROL_RESERVED;
-				  for(i=0; i<4; i++)
-					  if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-						  goto out_tsk;
-				  if (data)
-					  set_tsk_thread_flag(child, TIF_DEBUG);
-				  else
-					  clear_tsk_thread_flag(child, TIF_DEBUG);
-			  }
-			  addr -= (long) &dummy->u_debugreg;
-			  addr = addr >> 2;
-			  child->thread.debugreg[addr] = data;
-			  ret = 0;
+		if (addr >= (long) &dummy->u_debugreg[0] &&
+				addr <= (long) &dummy->u_debugreg[7]) {
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			ret = thread_set_debugreg(child, addr, data);
 		  }
 		  break;
 
@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
- out_tsk:
 	return ret;
 }
 
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
 obj-y	:= process.o signal.o entry.o traps.o irq.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+		hw_breakpoint.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
 #include <linux/suspend.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/debugreg.h>
 
 static struct saved_context saved_context;
 
@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
 	ctxt->cr2 = read_cr2();
 	ctxt->cr3 = read_cr3();
 	ctxt->cr4 = read_cr4();
+
+	disable_debug_registers();
 }
 
 void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)
 
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
-
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg[7]){
-		set_debugreg(current->thread.debugreg[0], 0);
-		set_debugreg(current->thread.debugreg[1], 1);
-		set_debugreg(current->thread.debugreg[2], 2);
-		set_debugreg(current->thread.debugreg[3], 3);
-		/* no 4 and 5 */
-		set_debugreg(current->thread.debugreg[6], 6);
-		set_debugreg(current->thread.debugreg[7], 7);
-	}
-
+	load_debug_registers();
 }
 
 void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include <asm/cacheflush.h>
 #include <asm/desc.h>
 #include <asm/uaccess.h>
+#include <asm/debugreg.h>
 
 void jprobe_return_end(void);
 
@@ -660,9 +661,19 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+		/*
+		 * The DR6 value is stored in args->err.
+		 * If DR_STEP is set and it's ours, we should clear DR_STEP
+		 * from the user's virtualized DR6 register.
+		 * Then if no more bits are set we should eat this exception.
+		 */
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+			current->thread.vdr6 &= ~DR_STEP;
+			if ((args->err & ~DR_STEP) == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
+
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
 		/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,225 @@
+#ifndef	_ASM_GENERIC_HW_BREAKPOINT_H
+#define	_ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef	__KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @info: arch-specific breakpoint info (address, length, and type)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints.  These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The breakpoint's address, length, and type are highly
+ * architecture-specific.  The values are encoded in the @info field; you
+ * specify them when registering the breakpoint.  To examine the encoded
+ * values use hw_breakpoint_get_{kaddress,uaddress,len,type}(), declared
+ * below.
+ *
+ * The address is specified as a regular kernel pointer (for kernel-space
+ * breakponts) or as an %__user pointer (for user-space breakpoints).
+ * With register_user_hw_breakpoint(), the address must refer to a
+ * location in user space.  The breakpoint will be active only while the
+ * requested task is running.  Conversely with
+ * register_kernel_hw_breakpoint(), the address must refer to a location
+ * in kernel space, and the breakpoint will be active on all CPUs
+ * regardless of the current task.
+ *
+ * The length is the breakpoint's extent in bytes, which is subject to
+ * certain limitations.  include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture.  Note that
+ * the address's alignment must match the length.  The breakpoint will
+ * catch accesses to any byte in the range from address to address +
+ * (length - 1).
+ *
+ * The breakpoint's type indicates the sort of access that will cause it
+ * to trigger.  Possible values may include:
+ *
+ * 	%HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * 	%HW_BREAKPOINT_RW (triggered on read or write access),
+ * 	%HW_BREAKPOINT_WRITE (triggered on write access), and
+ * 	%HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures.  Execute breakpoints
+ * must have length equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.  Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception).  All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource.  Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away.  Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete.  %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs.  A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered.  The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur.  It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be.  Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context.  Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled.  As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported.  (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.)  The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint().  If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use.  However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed.  (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable.  Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * 	printk(KERN_DEBUG "Breakpoint triggered\n");
+ * 	dump_stack();
+ *  	.......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ *	..........<do anything>............
+ *	my_bp.triggered = triggered;
+ *	my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ *	rc = register_kernel_hw_breakpoint(&my_bp, &pid_max,
+ *			HW_BREAKPOINT_LEN_4, HW_BREAKPOINT_WRITE);
+ *	..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ *	..........<do anything>............
+ *	unregister_kernel_hw_breakpoint(&my_bp);
+ *	..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+	struct list_head	node;
+	void		(*triggered)(struct hw_breakpoint *, struct pt_regs *);
+	void		(*installed)(struct hw_breakpoint *);
+	void		(*uninstalled)(struct hw_breakpoint *);
+	struct arch_hw_breakpoint	info;
+	u8		priority;
+	u8		status;
+};
+
+/*
+ * Inline accessor routines to retrieve the arch-specific parts of
+ * a breakpoint structure:
+ */
+static const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp);
+static const void __user *hw_breakpoint_get_uaddress(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture.  On i386 the
+ * possibilities are:
+ *
+ *	HW_BREAKPOINT_LEN_1
+ *	HW_BREAKPOINT_LEN_2
+ *	HW_BREAKPOINT_LEN_4
+ *	HW_BREAKPOINT_LEN_EXECUTE
+ *	HW_BREAKPOINT_RW
+ *	HW_BREAKPOINT_READ
+ *	HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable.  There also may be
+ * HW_BREAKPOINT_WRITE.  You can use #ifdef to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL	25
+#define HW_BREAKPOINT_PRIO_PTRACE	50
+#define HW_BREAKPOINT_PRIO_HIGH		75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED	1
+#define HW_BREAKPOINT_INSTALLED		2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems.  The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task.  In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp,
+		const void __user *address, unsigned len, unsigned type);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+		const void *address, unsigned len, unsigned type);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif	/* __KERNEL__ */
+#endif	/* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/debugreg.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
+	disable_debug_registers();
 
 	control_page = page_address(image->control_code_page);
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
 #include <smpboot_hooks.h>
 #include <asm/vmi.h>
 #include <asm/mtrr.h>
+#include <asm/debugreg.h>
 
 /* Set if we find a B stepping CPU */
 static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
 	local_irq_enable();
 
 	wmb();
+	load_debug_registers();
 	cpu_idle();
 }
 
@@ -1209,6 +1211,7 @@ int __cpu_disable(void)
 	fixup_irqs(map);
 	/* It's now safe to remove this processor from the online map */
 	cpu_clear(cpu, cpu_online_map);
+	disable_debug_registers();
 	return 0;
 }
 
Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,777 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines.  It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct cpu_hw_breakpoint *chbi;
+	struct kernel_bp_data *thr_kbpdata;
+
+	/* This routine is on the hot path; it gets called for every
+	 * context switch into a task with active breakpoints.  We
+	 * must make sure that the common case executes as quickly as
+	 * possible.
+	 */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = tsk;
+
+	/* Use RCU to synchronize with external updates */
+	rcu_read_lock();
+
+	/* Other CPUs might be making updates to the list of kernel
+	 * breakpoints at this time.  If they are, they will modify
+	 * the other entry in kbpdata[] -- the one not pointed to
+	 * by chbi->cur_kbpdata.  So the update itself won't affect
+	 * us directly.
+	 *
+	 * However when the update is finished, an IPI will arrive
+	 * telling this CPU to change chbi->cur_kbpdata.  We need
+	 * to use a single consistent kbpdata[] entry, the present one.
+	 * So we'll copy the pointer to a local variable, thr_kbpdata,
+	 * and we must prevent the compiler from aliasing the two
+	 * pointers.  Only a compiler barrier is required, not a full
+	 * memory barrier, because everything takes place on a single CPU.
+	 */
+ restart:
+	thr_kbpdata = chbi->cur_kbpdata;
+	barrier();
+
+	/* Normally we can keep the same debug register settings as the
+	 * last time this task ran.  But if the kernel breakpoints have
+	 * changed or any user breakpoints have been registered or
+	 * unregistered, we need to handle the updates and possibly
+	 * send out some notifications.
+	 */
+	if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+		struct hw_breakpoint *bp;
+		int i;
+		int num;
+
+		thbi->gennum = thr_kbpdata->gennum;
+		arch_update_thbi(thbi, thr_kbpdata);
+		num = thr_kbpdata->num_kbps;
+
+		/* This code can be invoked while a debugger is actively
+		 * updating the thread's breakpoint list (for example, if
+		 * someone sends SIGKILL to the task).  We use RCU to
+		 * protect our access to the list pointers. */
+		thbi->num_installed = 0;
+		i = HB_NUM;
+		list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+			/* If this register is allocated for kernel bps,
+			 * don't install.  Otherwise do. */
+			if (--i < num) {
+				if (bp->status == HW_BREAKPOINT_INSTALLED) {
+					if (bp->uninstalled)
+						(bp->uninstalled)(bp);
+					bp->status = HW_BREAKPOINT_REGISTERED;
+				}
+			} else {
+				++thbi->num_installed;
+				if (bp->status != HW_BREAKPOINT_INSTALLED) {
+					bp->status = HW_BREAKPOINT_INSTALLED;
+					if (bp->installed)
+						(bp->installed)(bp);
+				}
+			}
+		}
+	}
+
+	/* Set the debug register */
+	arch_install_thbi(thbi);
+
+	/* Were there any kernel breakpoint changes while we were running? */
+	if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+		/* Some debug registers now be assigned to kernel bps and
+		 * we might have messed them up.  Reload all the kernel bps
+		 * and then reload the thread bps.
+		 */
+		arch_install_chbi(chbi);
+		goto restart;
+	}
+
+	rcu_read_unlock();
+	put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+	struct cpu_hw_breakpoint *chbi;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+	chbi->bp_task = NULL;
+
+	/* This routine gets called from only two places.  In one
+	 * the caller holds the hw_breakpoint_mutex; in the other
+	 * interrupts are disabled.  In either case, no kernel
+	 * breakpoint updates can arrive while the routine runs.
+	 * So we don't need to use RCU.
+	 */
+	arch_install_none(chbi);
+	put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct task_struct *tsk = current;
+
+	chbi = &per_cpu(cpu_info, get_cpu());
+
+	/* Install both the kernel and the user breakpoints */
+	arch_install_chbi(chbi);
+	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+		switch_to_thread_hw_breakpoint(tsk);
+
+	put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+	/* We don't need to use any sort of memory barrier.  The IPI
+	 * carried out by on_each_cpu() includes its own barriers.
+	 */
+	on_each_cpu(update_this_cpu, NULL, 0, 0);
+	synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+	unsigned long flags;
+
+	/* Prevent IPIs for new kernel breakpoint updates */
+	local_irq_save(flags);
+
+	rcu_read_lock();
+	update_this_cpu(NULL);
+	rcu_read_unlock();
+
+	local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio.  Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+	int i;
+
+	for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+		tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority.  We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc.  In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+	struct thread_hw_breakpoint *thbi;
+
+	memset(tprio, 0, sizeof tprio);
+
+	/* Loop through all threads having registered breakpoints
+	 * and accumulate the maximum priority levels in tprio.
+	 */
+	list_for_each_entry(thbi, &thread_list, node)
+		accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[].  The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU.  If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+	int k, u;
+	int changed = 0;
+	struct hw_breakpoint *bp;
+	struct kernel_bp_data *new_kbpdata;
+
+	/* Determine how many debug registers are available for kernel
+	 * breakpoints as opposed to user breakpoints, based on the
+	 * priorities.  Ties are resolved in favor of user bps.
+	 */
+	k = 0;			/* Next kernel bp to allocate */
+	u = HB_NUM - 1;		/* Next user bp to allocate */
+	bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+	while (k <= u) {
+		if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+			--u;		/* User bps win a slot */
+		else {
+			++k;		/* Kernel bp wins a slot */
+			if (bp->status != HW_BREAKPOINT_INSTALLED)
+				changed = 1;
+			bp = list_entry(bp->node.next, struct hw_breakpoint,
+					node);
+		}
+	}
+	if (k != cur_kbpdata->num_kbps)
+		changed = 1;
+
+	/* Notify the remaining kernel breakpoints that they are about
+	 * to be uninstalled.
+	 */
+	list_for_each_entry_from(bp, &kernel_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED) {
+			if (bp->uninstalled)
+				(bp->uninstalled)(bp);
+			bp->status = HW_BREAKPOINT_REGISTERED;
+			changed = 1;
+		}
+	}
+
+	if (changed) {
+		cur_kbpindex ^= 1;
+		new_kbpdata = &kbpdata[cur_kbpindex];
+		new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+		new_kbpdata->num_kbps = k;
+		arch_new_kbpdata(new_kbpdata);
+		u = 0;
+		list_for_each_entry(bp, &kernel_bps, node) {
+			if (u >= k)
+				break;
+			new_kbpdata->bps[u] = bp;
+			++u;
+		}
+		rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+		/* Tell all the CPUs to update their debug registers */
+		update_all_cpus();
+
+		/* Notify the breakpoints that just got installed */
+		for (u = 0; u < k; ++u) {
+			bp = new_kbpdata->bps[u];
+			if (bp->status != HW_BREAKPOINT_INSTALLED) {
+				bp->status = HW_BREAKPOINT_INSTALLED;
+				if (bp->installed)
+					(bp->installed)(bp);
+			}
+		}
+	}
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+		struct task_struct *tsk)
+{
+	if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+		struct thread_hw_breakpoint *thbi;
+
+		thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+				GFP_KERNEL);
+		if (thbi) {
+			INIT_LIST_HEAD(&thbi->node);
+			INIT_LIST_HEAD(&thbi->thread_bps);
+
+			/* Force an update the next time tsk runs */
+			thbi->gennum = cur_kbpdata->gennum - 2;
+			tsk->thread.hw_breakpoint_info = thbi;
+		}
+	}
+	return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+	struct hw_breakpoint *bp;
+
+	if (!thbi)
+		return;
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Let the breakpoints know they are being uninstalled */
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+			(bp->uninstalled)(bp);
+		bp->status = 0;
+	}
+
+	/* Remove tsk from the list of all threads with registered bps */
+	list_del(&thbi->node);
+
+	/* The thread no longer has any breakpoints associated with it */
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	tsk->thread.hw_breakpoint_info = NULL;
+	kfree(thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities */
+	recalc_tprio();
+	balance_kernel_vs_user();
+
+	/* Actually uninstall the breakpoints if necessary */
+	if (tsk == current)
+		switch_to_none_hw_breakpoint();
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags)
+{
+	/* We will assume that breakpoint settings are not inherited
+	 * and the child starts out with no debug registers set.
+	 * But what about CLONE_PTRACE?
+	 */
+	clear_tsk_thread_flag(child, TIF_DEBUG);
+	return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+	struct hw_breakpoint *bp;
+	int i;
+
+	i = HB_NUM - 1;
+	list_for_each_entry(bp, &thbi->thread_bps, node) {
+		thbi->bps[i] = bp;
+		arch_store_thread_bp_array(thbi, bp, i);
+		if (--i < 0)
+			break;
+	}
+	while (i >= 0)
+		thbi->bps[i--] = NULL;
+
+	/* Force an update the next time this task runs */
+	thbi->gennum = cur_kbpdata->gennum - 2;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ *		tsk->thread.hw_breakpoint_info is not NULL.
+ *	tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ *		iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	struct list_head *head;
+	int pos;
+	struct hw_breakpoint *temp_bp;
+
+	/* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+	if (tsk)
+		head = &thbi->thread_bps;
+	else
+		head = &kernel_bps;
+
+	/* Equal-priority breakpoints get listed first-come-first-served */
+	pos = 0;
+	list_for_each_entry(temp_bp, head, node) {
+		if (bp->priority > temp_bp->priority)
+			break;
+		++pos;
+	}
+	bp->status = HW_BREAKPOINT_REGISTERED;
+	list_add_tail(&bp->node, &temp_bp->node);
+
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		/* Is this the thread's first registered breakpoint? */
+		if (list_empty(&thbi->node)) {
+			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			list_add(&thbi->node, &thread_list);
+		}
+	}
+	return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+	/* Remove bp from the thread's/kernel's list.  If the list is now
+	 * empty we must clear the TIF_DEBUG flag.  But keep the
+	 * thread_hw_breakpoint structure, so that the virtualized debug
+	 * register values will remain valid.
+	 */
+	list_del(&bp->node);
+	if (tsk) {
+		store_thread_bp_array(thbi);
+
+		if (list_empty(&thbi->thread_bps)) {
+			list_del_init(&thbi->node);
+			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+		}
+	}
+
+	/* Tell the breakpoint it is being uninstalled */
+	if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+		(bp->uninstalled)(bp);
+	bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk,
+		unsigned long address, unsigned len, unsigned type)
+{
+	int rc = -EINVAL;
+	unsigned long align;
+
+	switch (type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+	case HW_BREAKPOINT_EXECUTE:
+		if (len != HW_BREAKPOINT_LEN_EXECUTE)
+			return rc;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+	case HW_BREAKPOINT_READ:	break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+	case HW_BREAKPOINT_WRITE:	break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+	case HW_BREAKPOINT_RW:		break;
+#endif
+	default:
+		return rc;
+	}
+
+	switch (len) {
+#ifdef HW_BREAKPOINT_LEN_1
+	case HW_BREAKPOINT_LEN_1:
+		align = 0;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+	case HW_BREAKPOINT_LEN_2:
+		align = 1;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+	case HW_BREAKPOINT_LEN_4:
+		align = 3;
+		break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+	case HW_BREAKPOINT_LEN_8:
+		align = 7;
+		break;
+#endif
+	default:
+		return rc;
+	}
+
+	/* Check that the low-order bits of the address are appropriate
+	 * for the alignment implied by len.
+	 */
+	if (address & align)
+		return rc;
+
+	/* Check that the virtual address is in the proper range */
+	if (tsk) {
+		if (!arch_check_va_in_userspace(address, tsk))
+			return rc;
+	} else {
+		if (!arch_check_va_in_kernelspace(address))
+			return rc;
+	}
+
+	if (bp->triggered) {
+		rc = 0;
+		arch_store_info(bp, address, len, type);
+	}
+	return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp,
+		unsigned long address, unsigned len, unsigned type)
+{
+	int rc;
+	struct thread_hw_breakpoint *thbi;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, tsk, address, len, type);
+	if (rc)
+		return rc;
+
+	thbi = alloc_thread_hw_breakpoint(tsk);
+	if (!thbi)
+		return -ENOMEM;
+
+	/* Insert bp in the thread's list */
+	pos = insert_bp_in_list(bp, thbi, tsk);
+	arch_register_user_hw_breakpoint(bp, thbi);
+
+	/* Update and rebalance the priorities.  We don't need to go through
+	 * the list of all threads; adding a breakpoint can only cause the
+	 * priorities for this thread to increase.
+	 */
+	accum_thread_tprio(thbi);
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+		rc = 1;
+
+		/* Does it need to be installed right now? */
+		if (tsk == current)
+			switch_to_thread_hw_breakpoint(tsk);
+		/* Otherwise it will get installed the next time tsk runs */
+	}
+
+	return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running.  It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp.  @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp,
+		const void __user *address, unsigned len, unsigned type)
+{
+	int rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+	rc = __register_user_hw_breakpoint(tsk, bp,
+			(unsigned long) address, len, type);
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+	if (!bp->status)
+		return;		/* Not registered */
+
+	/* Remove bp from the thread's list */
+	remove_bp_from_list(bp, thbi, tsk);
+	arch_unregister_user_hw_breakpoint(bp, thbi);
+
+	/* Recalculate and rebalance the kernel-vs-user priorities,
+	 * and actually uninstall bp if necessary.
+	 */
+	recalc_tprio();
+	balance_kernel_vs_user();
+	if (tsk == current)
+		switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+		struct hw_breakpoint *bp)
+{
+	mutex_lock(&hw_breakpoint_mutex);
+	__unregister_user_hw_breakpoint(tsk, bp);
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be active at all times.  It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp.  @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+		const void *address, unsigned len, unsigned type)
+{
+	int rc;
+	int pos;
+
+	bp->status = 0;
+	rc = validate_settings(bp, NULL, (unsigned long) address, len, type);
+	if (rc)
+		return rc;
+
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Insert bp in the kernel's list */
+	pos = insert_bp_in_list(bp, NULL, NULL);
+	arch_register_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will install bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	/* Did bp get allocated to a debug register?  We can tell from its
+	 * position in the list.  The number of registers allocated to
+	 * kernel breakpoints is num_kbps; all the others are available for
+	 * user breakpoints.  If bp's position in the priority-ordered list
+	 * is low enough, it will get a register.
+	 */
+	if (pos < cur_kbpdata->num_kbps)
+		rc = 1;
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+	if (!bp->status)
+		return;		/* Not registered */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	/* Remove bp from the kernel's list */
+	remove_bp_from_list(bp, NULL, NULL);
+	arch_unregister_kernel_hw_breakpoint(bp);
+
+	/* Rebalance the priorities.  This will uninstall bp if it
+	 * was allocated a debug register.
+	 */
+	balance_kernel_vs_user();
+
+	mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-25 20:51                                                                 ` Alan Stern
@ 2007-06-26 18:17                                                                   ` Roland McGrath
  2007-06-27  2:43                                                                     ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-06-26 18:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 612 bytes --]

I needed the attached patch on top of the bptest patch for the current
code.  Btw, that is a very nice little tester!

Below that is a patch to go on top of your current patch, with x86-64
support.  I've only tried a few trivial tests with bptest (including an
8-byte bp), which worked great.  It is a pretty faithful copy of your i386
changes.  I'm still not sure we have all that right, but you might as well
incorporate this into your patch.  You should change the x86_64 code in
parallel with any i386 changes we decide on later, and I can test it and
send you any typo fixups or whatnot.


Thanks,
Roland



[-- Attachment #2: bptest patch --]
[-- Type: text/plain, Size: 2236 bytes --]

--- bptest/bptest.c.~1~	2007-06-25 04:08:06.000000000 -0700
+++ bptest/bptest.c	2007-06-26 01:14:20.000000000 -0700
@@ -147,17 +147,17 @@ static DRIVER_ATTR(call, 0200, NULL, cal
 /* Breakpoint callbacks */
 static void bptest_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
 {
-	printk(KERN_INFO "Breakpoint %d triggered\n", bp - bps);
+	printk(KERN_INFO "Breakpoint %d triggered\n", (int) (bp - bps));
 }
 
 static void bptest_installed(struct hw_breakpoint *bp)
 {
-	printk(KERN_INFO "Breakpoint %d installed\n", bp - bps);
+	printk(KERN_INFO "Breakpoint %d installed\n", (int) (bp - bps));
 }
 
 static void bptest_uninstalled(struct hw_breakpoint *bp)
 {
-	printk(KERN_INFO "Breakpoint %d uninstalled\n", bp - bps);
+	printk(KERN_INFO "Breakpoint %d uninstalled\n", (int) (bp - bps));
 }
 
 
@@ -204,7 +204,7 @@ static ssize_t bp_show(int n, char *buf)
 
 	a = -1;
 	if (type == 'e') {
-		const void *addr = hw_breakpoint_get_kaddr(bp);
+		const void *addr = hw_breakpoint_get_kaddress(bp);
 
 		if (addr == r0)
 			a = 0;
@@ -215,7 +215,7 @@ static ssize_t bp_show(int n, char *buf)
 		else if (addr == r3)
 			a = 3;
 	} else {
-		const unsigned char *p = hw_breakpoint_get_kaddr(bp);
+		const unsigned char *p = hw_breakpoint_get_kaddress(bp);
 
 		if (p >= bytes && p < bytes + NUM_BYTES)
 			a = p - bytes;
@@ -233,6 +233,7 @@ static ssize_t bp_store(int n, const cha
 	char atype;
 	unsigned len, type;
 	int i;
+	void *addr;
 
 	if (count <= 1) {
 		printk(KERN_INFO "bptest: bp%d: format:  priority type "
@@ -290,7 +291,7 @@ static ssize_t bp_store(int n, const cha
 		return -EINVAL;
 	}
 	if (atype == 'e')
-		hw_breakpoint_kinit(bp, rtns[a], len, type);
+		addr = rtns[a];
 	else {
 		switch (alen) {
 #ifdef HW_BREAKPOINT_LEN_1
@@ -311,14 +312,14 @@ static ssize_t bp_store(int n, const cha
 			return -EINVAL;
 			break;
 		}
-		hw_breakpoint_kinit(bp, &bytes[a], len, type);
+		addr = &bytes[a];
 	}
 
 	bp->triggered = bptest_triggered;
 	bp->installed = bptest_installed;
 	bp->uninstalled = bptest_uninstalled;
 
-	i = register_kernel_hw_breakpoint(bp);
+	i = register_kernel_hw_breakpoint(bp, addr, len, type);
 	if (i < 0) {
 		printk(KERN_WARNING "bptest: bp%d: failed to register %d\n",
 				n, i);

[-- Attachment #3: hw-breakpoint port to x86-64 --]
[-- Type: text/plain, Size: 24412 bytes --]

---
 arch/i386/kernel/hw_breakpoint.c   |    5 --
 arch/x86_64/ia32/ia32_aout.c       |   10 ----
 arch/x86_64/ia32/ptrace32.c        |   65 ++++----------------------------
 arch/x86_64/kernel/Makefile        |    3 +
 arch/x86_64/kernel/kprobes.c       |   14 +++++-
 arch/x86_64/kernel/machine_kexec.c |    2 
 arch/x86_64/kernel/process.c       |   46 +++++++++++-----------
 arch/x86_64/kernel/ptrace.c        |   72 +++++------------------------------
 arch/x86_64/kernel/signal.c        |    8 ---
 arch/x86_64/kernel/smpboot.c       |    4 +
 arch/x86_64/kernel/suspend.c       |   17 +-------
 arch/x86_64/kernel/traps.c         |   75 +++++++++++++------------------------
 include/asm-x86_64/debugreg.h      |   30 ++++++++++++++
 include/asm-x86_64/hw_breakpoint.h |   50 ++++++++++++++++++++++++
 include/asm-x86_64/processor.h     |   10 +---
 include/asm-x86_64/suspend.h       |    3 -
 16 files changed, 184 insertions(+), 230 deletions(-)

Index: b/arch/x86_64/kernel/kprobes.c
===================================================================
--- a/arch/x86_64/kernel/kprobes.c
+++ b/arch/x86_64/kernel/kprobes.c
@@ -42,6 +42,7 @@
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
+#include <asm/debugreg.h>
 
 void jprobe_return_end(void);
 static void __kprobes arch_copy_kprobe(struct kprobe *p);
@@ -652,8 +653,17 @@ int __kprobes kprobe_exceptions_notify(s
 			ret = NOTIFY_STOP;
 		break;
 	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
+		/*
+		 * The DR6 value is stored in args->err.
+		 * If DR_STEP is set and it's ours, we should clear DR_STEP
+		 * from the user's virtualized DR6 register.
+		 * Then if no more bits are set we should eat this exception.
+		 */
+		if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+			current->thread.vdr6 &= ~DR_STEP;
+			if ((args->err & ~DR_STEP) == 0)
+				ret = NOTIFY_STOP;
+		}
 		break;
 	case DIE_GPF:
 	case DIE_PAGE_FAULT:
Index: b/include/asm-x86_64/hw_breakpoint.h
===================================================================
--- /dev/null
+++ b/include/asm-x86_64/hw_breakpoint.h
@@ -0,0 +1,50 @@
+#ifndef	_X86_64_HW_BREAKPOINT_H
+#define	_X86_64_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#define	__ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+	unsigned long	address;
+	u8		len;
+	u8		type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+	return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+		struct hw_breakpoint *bp)
+{
+	return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+	return bp->info.len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+	return bp->info.type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1		0x40
+#define HW_BREAKPOINT_LEN_2		0x44
+#define HW_BREAKPOINT_LEN_4		0x4c
+#define HW_BREAKPOINT_LEN_8		0x48
+#define HW_BREAKPOINT_LEN_EXECUTE	0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE	0x80	/* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE	0x81	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x83	/* trigger on memory read or write */
+
+#endif	/* __KERNEL__ */
+#endif	/* _X86_64_HW_BREAKPOINT_H */
Index: b/include/asm-x86_64/debugreg.h
===================================================================
--- a/include/asm-x86_64/debugreg.h
+++ b/include/asm-x86_64/debugreg.h
@@ -49,6 +49,8 @@
 
 #define DR_LOCAL_ENABLE_SHIFT 0    /* Extra shift to the local enable bit */
 #define DR_GLOBAL_ENABLE_SHIFT 1   /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1)      /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2)     /* Global enable for reg 0 */
 #define DR_ENABLE_SIZE 2           /* 2 enable bits per register */
 
 #define DR_LOCAL_ENABLE_MASK (0x55)  /* Set  local bits for all 4 regs */
@@ -62,4 +64,32 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM		4	/* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+		struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+	set_debugreg(0UL, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif	/* __KERNEL__ */
+
 #endif
Index: b/arch/x86_64/ia32/ia32_aout.c
===================================================================
--- a/arch/x86_64/ia32/ia32_aout.c
+++ b/arch/x86_64/ia32/ia32_aout.c
@@ -32,6 +32,7 @@
 #include <asm/cacheflush.h>
 #include <asm/user32.h>
 #include <asm/ia32.h>
+#include <asm/debugreg.h>
 
 #undef WARN_OLD
 #undef CORE_DUMP /* probably broken */
@@ -57,14 +58,7 @@ static void dump_thread32(struct pt_regs
 	dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
 	dump->u_dsize -= dump->u_tsize;
 	dump->u_ssize = 0;
-	dump->u_debugreg[0] = current->thread.debugreg0;  
-	dump->u_debugreg[1] = current->thread.debugreg1;  
-	dump->u_debugreg[2] = current->thread.debugreg2;  
-	dump->u_debugreg[3] = current->thread.debugreg3;  
-	dump->u_debugreg[4] = 0;  
-	dump->u_debugreg[5] = 0;  
-	dump->u_debugreg[6] = current->thread.debugreg6;  
-	dump->u_debugreg[7] = current->thread.debugreg7;  
+	dump_thread_hw_breakpoint(current, dump->u_debugreg);
 
 	if (dump->start_stack < 0xc0000000)
 		dump->u_ssize = ((unsigned long) (0xc0000000 - dump->start_stack)) >> PAGE_SHIFT;
Index: b/arch/x86_64/ia32/ptrace32.c
===================================================================
--- a/arch/x86_64/ia32/ptrace32.c
+++ b/arch/x86_64/ia32/ptrace32.c
@@ -39,7 +39,6 @@
 
 static int putreg32(struct task_struct *child, unsigned regno, u32 val)
 {
-	int i;
 	__u64 *stack = (__u64 *)task_pt_regs(child);
 
 	switch (regno) {
@@ -85,43 +84,11 @@ static int putreg32(struct task_struct *
 		break;
 	}
 
-	case offsetof(struct user32, u_debugreg[4]): 
-	case offsetof(struct user32, u_debugreg[5]):
-		return -EIO;
-
-	case offsetof(struct user32, u_debugreg[0]):
-		child->thread.debugreg0 = val;
-		break;
-
-	case offsetof(struct user32, u_debugreg[1]):
-		child->thread.debugreg1 = val;
-		break;
-
-	case offsetof(struct user32, u_debugreg[2]):
-		child->thread.debugreg2 = val;
-		break;
-
-	case offsetof(struct user32, u_debugreg[3]):
-		child->thread.debugreg3 = val;
-		break;
-
-	case offsetof(struct user32, u_debugreg[6]):
-		child->thread.debugreg6 = val;
-		break; 
-
-	case offsetof(struct user32, u_debugreg[7]):
-		val &= ~DR_CONTROL_RESERVED;
-		/* See arch/i386/kernel/ptrace.c for an explanation of
-		 * this awkward check.*/
-		for(i=0; i<4; i++)
-			if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1)
-			       return -EIO;
-		child->thread.debugreg7 = val; 
-		if (val)
-			set_tsk_thread_flag(child, TIF_DEBUG);
-		else
-			clear_tsk_thread_flag(child, TIF_DEBUG);
-		break; 
+	case offsetof(struct user32, u_debugreg[0])
+		... offsetof(struct user32, u_debugreg[7]):
+		regno -= offsetof(struct user32, u_debugreg[0]);
+		regno >>= 2;
+		return thread_set_debugreg(child, regno, val);
 		    
 	default:
 		if (regno > sizeof(struct user32) || (regno & 3))
@@ -170,23 +137,11 @@ static int getreg32(struct task_struct *
 	R32(eflags, eflags);
 	R32(esp, rsp);
 
-	case offsetof(struct user32, u_debugreg[0]): 
-		*val = child->thread.debugreg0; 
-		break; 
-	case offsetof(struct user32, u_debugreg[1]): 
-		*val = child->thread.debugreg1; 
-		break; 
-	case offsetof(struct user32, u_debugreg[2]): 
-		*val = child->thread.debugreg2; 
-		break; 
-	case offsetof(struct user32, u_debugreg[3]): 
-		*val = child->thread.debugreg3; 
-		break; 
-	case offsetof(struct user32, u_debugreg[6]): 
-		*val = child->thread.debugreg6; 
-		break; 
-	case offsetof(struct user32, u_debugreg[7]): 
-		*val = child->thread.debugreg7; 
+	case offsetof(struct user32, u_debugreg[0])
+		... offsetof(struct user32, u_debugreg[7]):
+		regno -= offsetof(struct user32, u_debugreg[0]);
+		regno >>= 2;
+		*val = thread_get_debugreg(child, regno);
 		break; 
 		    
 	default:
Index: b/arch/x86_64/kernel/Makefile
===================================================================
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -61,3 +61,6 @@ msr-$(subst m,y,$(CONFIG_X86_MSR))  += .
 alternative-y			+= ../../i386/kernel/alternative.o
 pcspeaker-y			+= ../../i386/kernel/pcspeaker.o
 perfctr-watchdog-y		+= ../../i386/kernel/cpu/perfctr-watchdog.o
+
+obj-y				+= hw_breakpoint.o
+hw_breakpoint-y			+= ../../i386/kernel/hw_breakpoint.o
Index: b/arch/x86_64/kernel/machine_kexec.c
===================================================================
--- a/arch/x86_64/kernel/machine_kexec.c
+++ b/arch/x86_64/kernel/machine_kexec.c
@@ -14,6 +14,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/io.h>
+#include <asm/debugreg.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u64 kexec_pgd[512] PAGE_ALIGNED;
@@ -185,6 +186,7 @@ NORET_TYPE void machine_kexec(struct kim
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
+	disable_debug_registers();
 
 	control_page = page_address(image->control_code_page) + PAGE_SIZE;
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: b/arch/x86_64/kernel/process.c
===================================================================
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -51,6 +51,7 @@
 #include <asm/proto.h>
 #include <asm/ia32.h>
 #include <asm/idle.h>
+#include <asm/debugreg.h>
 
 asmlinkage extern void ret_from_fork(void);
 
@@ -379,6 +380,9 @@ void exit_thread(void)
 		t->io_bitmap_max = 0;
 		put_cpu();
 	}
+
+	if (unlikely(me->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(me);
 }
 
 void flush_thread(void)
@@ -394,14 +398,10 @@ void flush_thread(void)
 			current_thread_info()->status |= TS_COMPAT;
 		}
 	}
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
 
-	tsk->thread.debugreg0 = 0;
-	tsk->thread.debugreg1 = 0;
-	tsk->thread.debugreg2 = 0;
-	tsk->thread.debugreg3 = 0;
-	tsk->thread.debugreg6 = 0;
-	tsk->thread.debugreg7 = 0;
+	if (unlikely(tsk->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(tsk);
+
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
 	/*
 	 * Forget coprocessor state..
@@ -487,6 +487,14 @@ int copy_thread(int nr, unsigned long cl
 	asm("mov %%es,%0" : "=m" (p->thread.es));
 	asm("mov %%ds,%0" : "=m" (p->thread.ds));
 
+	p->thread.hw_breakpoint_info = NULL;
+	p->thread.io_bitmap_ptr = NULL;
+
+	err = -ENOMEM;
+	if (unlikely(me->thread.hw_breakpoint_info) &&
+	    copy_thread_hw_breakpoint(me, p, clone_flags))
+		goto out;
+
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
 		if (!p->thread.io_bitmap_ptr) {
@@ -513,6 +521,8 @@ int copy_thread(int nr, unsigned long cl
 	}
 	err = 0;
 out:
+	if (err)
+		flush_thread_hw_breakpoint(p);
 	if (err && p->thread.io_bitmap_ptr) {
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
@@ -520,11 +530,6 @@ out:
 	return err;
 }
 
-/*
- * This special macro can be used to load a debugging register
- */
-#define loaddebug(thread,r) set_debugreg(thread->debugreg ## r, r)
-
 static inline void __switch_to_xtra(struct task_struct *prev_p,
 			     	    struct task_struct *next_p,
 			     	    struct tss_struct *tss)
@@ -534,16 +539,6 @@ static inline void __switch_to_xtra(stru
 	prev = &prev_p->thread,
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		loaddebug(next, 0);
-		loaddebug(next, 1);
-		loaddebug(next, 2);
-		loaddebug(next, 3);
-		/* no 4 and 5 */
-		loaddebug(next, 6);
-		loaddebug(next, 7);
-	}
-
 	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
 		/*
 		 * Copy the relevant range of the IO bitmap.
@@ -557,6 +552,13 @@ static inline void __switch_to_xtra(stru
 		 */
 		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
 	}
+
+	/*
+	 * Handle debug registers.  This must be done _after_ current
+	 * is updated.
+	 */
+	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+		switch_to_thread_hw_breakpoint(next_p);
 }
 
 /*
Index: b/arch/x86_64/kernel/ptrace.c
===================================================================
--- a/arch/x86_64/kernel/ptrace.c
+++ b/arch/x86_64/kernel/ptrace.c
@@ -307,7 +307,7 @@ static unsigned long getreg(struct task_
 
 long arch_ptrace(struct task_struct *child, long request, long addr, long data)
 {
-	long i, ret;
+	long ret;
 	unsigned ui;
 
 	switch (request) {
@@ -338,23 +338,11 @@ long arch_ptrace(struct task_struct *chi
 		case 0 ... sizeof(struct user_regs_struct) - sizeof(long):
 			tmp = getreg(child, addr);
 			break;
-		case offsetof(struct user, u_debugreg[0]):
-			tmp = child->thread.debugreg0;
-			break;
-		case offsetof(struct user, u_debugreg[1]):
-			tmp = child->thread.debugreg1;
-			break;
-		case offsetof(struct user, u_debugreg[2]):
-			tmp = child->thread.debugreg2;
-			break;
-		case offsetof(struct user, u_debugreg[3]):
-			tmp = child->thread.debugreg3;
-			break;
-		case offsetof(struct user, u_debugreg[6]):
-			tmp = child->thread.debugreg6;
-			break;
-		case offsetof(struct user, u_debugreg[7]):
-			tmp = child->thread.debugreg7;
+		case offsetof(struct user, u_debugreg[0])
+			... offsetof(struct user, u_debugreg[7]):
+			addr -= offsetof(struct user, u_debugreg[0]);
+			addr >>= 3;
+			tmp = thread_get_debugreg(child, addr);
 			break;
 		default:
 			tmp = 0;
@@ -375,7 +363,6 @@ long arch_ptrace(struct task_struct *chi
 
 	case PTRACE_POKEUSR: /* write the word at location addr in the USER area */
 	{
-		int dsize = test_tsk_thread_flag(child, TIF_IA32) ? 3 : 7;
 		ret = -EIO;
 		if ((addr & 7) ||
 		    addr > sizeof(struct user) - 7)
@@ -385,49 +372,12 @@ long arch_ptrace(struct task_struct *chi
 		case 0 ... sizeof(struct user_regs_struct) - sizeof(long):
 			ret = putreg(child, addr, data);
 			break;
-		/* Disallows to set a breakpoint into the vsyscall */
-		case offsetof(struct user, u_debugreg[0]):
-			if (data >= TASK_SIZE_OF(child) - dsize) break;
-			child->thread.debugreg0 = data;
-			ret = 0;
-			break;
-		case offsetof(struct user, u_debugreg[1]):
-			if (data >= TASK_SIZE_OF(child) - dsize) break;
-			child->thread.debugreg1 = data;
-			ret = 0;
-			break;
-		case offsetof(struct user, u_debugreg[2]):
-			if (data >= TASK_SIZE_OF(child) - dsize) break;
-			child->thread.debugreg2 = data;
-			ret = 0;
-			break;
-		case offsetof(struct user, u_debugreg[3]):
-			if (data >= TASK_SIZE_OF(child) - dsize) break;
-			child->thread.debugreg3 = data;
-			ret = 0;
-			break;
-		case offsetof(struct user, u_debugreg[6]):
-				  if (data >> 32)
-				break; 
-			child->thread.debugreg6 = data;
-			ret = 0;
+		case offsetof(struct user, u_debugreg[0])
+			... offsetof(struct user, u_debugreg[7]):
+			addr -= offsetof(struct user, u_debugreg[0]);
+			addr >>= 3;
+			ret = thread_set_debugreg(child, addr, data);
 			break;
-		case offsetof(struct user, u_debugreg[7]):
-			/* See arch/i386/kernel/ptrace.c for an explanation of
-			 * this awkward check.*/
-			data &= ~DR_CONTROL_RESERVED;
-			for(i=0; i<4; i++)
-				if ((0x5554 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
-					break;
-			if (i == 4) {
-			  child->thread.debugreg7 = data;
-			  if (data)
-			  	set_tsk_thread_flag(child, TIF_DEBUG);
-			  else
-			  	clear_tsk_thread_flag(child, TIF_DEBUG);
-			  ret = 0;
-		  	}
-		  break;
 		}
 		break;
 	}
Index: b/arch/x86_64/kernel/signal.c
===================================================================
--- a/arch/x86_64/kernel/signal.c
+++ b/arch/x86_64/kernel/signal.c
@@ -411,14 +411,6 @@ static void do_signal(struct pt_regs *re
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {
-		/* Reenable any watchpoints before delivering the
-		 * signal to user space. The processor register will
-		 * have been cleared if the watchpoint triggered
-		 * inside the kernel.
-		 */
-		if (current->thread.debugreg7)
-			set_debugreg(current->thread.debugreg7, 7);
-
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
 			/* a signal was successfully delivered; the saved
Index: b/arch/x86_64/kernel/smpboot.c
===================================================================
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -59,6 +59,7 @@
 #include <asm/irq.h>
 #include <asm/hw_irq.h>
 #include <asm/numa.h>
+#include <asm/debugreg.h>
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
@@ -378,6 +379,8 @@ void __cpuinit start_secondary(void)
 
 	unlock_ipi_call_lock();
 
+	load_debug_registers();
+
 	cpu_idle();
 }
 
@@ -1043,6 +1046,7 @@ int __cpu_disable(void)
 	spin_unlock(&vector_lock);
 	remove_cpu_from_maps();
 	fixup_irqs(cpu_online_map);
+	disable_debug_registers();
 	return 0;
 }
 
Index: b/arch/x86_64/kernel/suspend.c
===================================================================
--- a/arch/x86_64/kernel/suspend.c
+++ b/arch/x86_64/kernel/suspend.c
@@ -13,6 +13,7 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/mtrr.h>
+#include <asm/debugreg.h>
 
 /* References to section boundaries */
 extern const void __nosave_begin, __nosave_end;
@@ -60,6 +61,8 @@ void __save_processor_state(struct saved
 	asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
 	asm volatile ("movq %%cr4, %0" : "=r" (ctxt->cr4));
 	asm volatile ("movq %%cr8, %0" : "=r" (ctxt->cr8));
+
+	disable_debug_registers();
 }
 
 void save_processor_state(void)
@@ -131,19 +134,7 @@ void fix_processor_context(void)
 	load_TR_desc();				/* This does ltr */
 	load_LDT(&current->active_mm->context);	/* This does lldt */
 
-	/*
-	 * Now maybe reload the debug registers
-	 */
-	if (current->thread.debugreg7){
-                loaddebug(&current->thread, 0);
-                loaddebug(&current->thread, 1);
-                loaddebug(&current->thread, 2);
-                loaddebug(&current->thread, 3);
-                /* no 4 and 5 */
-                loaddebug(&current->thread, 6);
-                loaddebug(&current->thread, 7);
-	}
-
+	load_debug_registers();
 }
 
 #ifdef CONFIG_SOFTWARE_SUSPEND
Index: b/arch/x86_64/kernel/traps.c
===================================================================
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -829,67 +829,46 @@ asmlinkage __kprobes struct pt_regs *syn
 asmlinkage void __kprobes do_debug(struct pt_regs * regs,
 				   unsigned long error_code)
 {
-	unsigned long condition;
+	unsigned long dr6;
 	struct task_struct *tsk = current;
 	siginfo_t info;
 
-	get_debugreg(condition, 6);
+	get_debugreg(dr6, 6);
+	set_debugreg(0UL, 6);	/* DR6 may or may not be cleared by the CPU */
 
-	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
+	/* Store the virtualized DR6 value */
+	tsk->thread.vdr6 = dr6;
+
+	if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
 						SIGTRAP) == NOTIFY_STOP)
 		return;
 
 	preempt_conditional_sti(regs);
 
-	/* Mask out spurious debug traps due to lazy DR7 setting */
-	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
-		if (!tsk->thread.debugreg7) { 
-			goto clear_dr7;
-		}
+	/*
+	 * Single-stepping through system calls: ignore any exceptions in
+	 * kernel space, but re-enable TF when returning to user mode.
+	 *
+	 * We already checked v86 mode above, so we can check for kernel mode
+	 * by just checking the CPL of CS.
+	 */
+	if ((dr6 & DR_STEP) && !user_mode(regs)) {
+		tsk->thread.vdr6 &= ~DR_STEP;
+		set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+		regs->eflags &= ~X86_EFLAGS_TF;
 	}
 
-	tsk->thread.debugreg6 = condition;
-
-	/* Mask out spurious TF errors due to lazy TF clearing */
-	if (condition & DR_STEP) {
-		/*
-		 * The TF error should be masked out only if the current
-		 * process is not traced and if the TRAP flag has been set
-		 * previously by a tracing process (condition detected by
-		 * the PT_DTRACE flag); remember that the i386 TRAP flag
-		 * can be modified by the process itself in user mode,
-		 * allowing programs to debug themselves without the ptrace()
-		 * interface.
-		 */
-                if (!user_mode(regs))
-                       goto clear_TF_reenable;
-		/*
-		 * Was the TF flag set by a debugger? If so, clear it now,
-		 * so that register information is correct.
-		 */
-		if (tsk->ptrace & PT_DTRACE) {
-			regs->eflags &= ~TF_MASK;
-			tsk->ptrace &= ~PT_DTRACE;
-		}
+	if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
+		/* Ok, finally something we can handle */
+		tsk->thread.trap_no = 1;
+		tsk->thread.error_code = error_code;
+		info.si_signo = SIGTRAP;
+		info.si_errno = 0;
+		info.si_code = TRAP_BRKPT;
+		info.si_addr = user_mode(regs) ? (void __user *)regs->rip : NULL;
+		force_sig_info(SIGTRAP, &info, tsk);
 	}
 
-	/* Ok, finally something we can handle */
-	tsk->thread.trap_no = 1;
-	tsk->thread.error_code = error_code;
-	info.si_signo = SIGTRAP;
-	info.si_errno = 0;
-	info.si_code = TRAP_BRKPT;
-	info.si_addr = user_mode(regs) ? (void __user *)regs->rip : NULL;
-	force_sig_info(SIGTRAP, &info, tsk);
-
-clear_dr7:
-	set_debugreg(0UL, 7);
-	preempt_conditional_cli(regs);
-	return;
-
-clear_TF_reenable:
-	set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
-	regs->eflags &= ~TF_MASK;
 	preempt_conditional_cli(regs);
 }
 
Index: b/include/asm-x86_64/processor.h
===================================================================
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -221,13 +221,9 @@ struct thread_struct {
 	unsigned long	fs;
 	unsigned long	gs;
 	unsigned short	es, ds, fsindex, gsindex;	
-/* Hardware debugging registers */
-	unsigned long	debugreg0;  
-	unsigned long	debugreg1;  
-	unsigned long	debugreg2;  
-	unsigned long	debugreg3;  
-	unsigned long	debugreg6;  
-	unsigned long	debugreg7;  
+/* Hardware breakpoint info */
+	unsigned long	vdr6;
+	struct thread_hw_breakpoint	*hw_breakpoint_info;
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
Index: b/include/asm-x86_64/suspend.h
===================================================================
--- a/include/asm-x86_64/suspend.h
+++ b/include/asm-x86_64/suspend.h
@@ -39,9 +39,6 @@ extern unsigned long saved_context_r08, 
 extern unsigned long saved_context_r12, saved_context_r13, saved_context_r14, saved_context_r15;
 extern unsigned long saved_context_eflags;
 
-#define loaddebug(thread,register) \
-	set_debugreg((thread)->debugreg##register, register)
-
 extern void fix_processor_context(void);
 
 #ifdef CONFIG_ACPI_SLEEP
Index: b/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- a/arch/i386/kernel/hw_breakpoint.c
+++ b/arch/i386/kernel/hw_breakpoint.c
@@ -128,7 +128,7 @@ static void arch_install_chbi(struct cpu
 	struct hw_breakpoint **bps;
 
 	/* Don't allow debug exceptions while we update the registers */
-	set_debugreg(0, 7);
+	set_debugreg(0UL, 7);
 	chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
 
 	/* Kernel breakpoints are stored starting in DR0 and going up */
@@ -391,7 +391,6 @@ static void ptrace_triggered(struct hw_b
 	if (thbi) {
 		i = bp - thbi->vdr_bps;
 		tsk->thread.vdr6 |= (DR_TRAP0 << i);
-		send_sigtrap(tsk, regs, 0);
 	}
 }
 
@@ -588,7 +587,7 @@ static int __kprobes hw_breakpoint_handl
 	/* Disable all breakpoints so that the callbacks can run without
 	 * triggering recursive debug exceptions.
 	 */
-	set_debugreg(0, 7);
+	set_debugreg(0UL, 7);
 
 	/* Handle all the breakpoints that were triggered */
 	for (i = 0; i < HB_NUM; ++i) {

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-25 15:36                                                                 ` Alan Stern
@ 2007-06-26 20:49                                                                   ` Roland McGrath
  2007-06-27  3:26                                                                     ` Alan Stern
  0 siblings, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-06-26 20:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> All right, I'll change it.  And I'll encapsulate those fields.  I still 
> think it will accomplish nothing more than hiding some implementation 
> details which don't really need to be hidden.

It makes me a little happier, and I at least consider that a substantial
accomplishment.  ;-)

> It's below.  The patch logs the value of DR6 when each debug interrupt 
> occurs, and it adds another sysfs attribute to the bptest driver.  The 
> attribute is named "test", and it contains the value that the IRQ 
> handler will write back to DR6.  Combine this with the Alt-SysRq-P 
> change already submitted, and you can get a clear view of what's going 
> on.

Thanks.  I haven't played with this.

> I see.  So I could add a CONFIG_HW_BREAKPOINT option and make 
> CONFIG_PTRACE depend on it.  That will be simple enough.

Right.  

> Do you think it would make sense to allow utrace without hw-breakpoint?

Sure.  There's no special reason to want to turn hw-breakpoint off, but
it is a naturally separable option.

> Here's the next iteration.  The arch-specific parts are now completely 
> encapsulated.  validate_settings is in a form which should be workable 
> on all architectures.  And the address, length, and type are passed as 
> arguments to register_{kernel,user}_hw_breakpoint().

I like it!

> I haven't tried to modify Kconfig at all.  To do it properly would
> require making ptrace configurable, which is not something I want to
> tackle at the moment.

You don't need to worry about that.  Under utrace, CONFIG_PTRACE is
already separate and can be turned off.  I don't think we need really to
finish the Kconfig stuff at all before I merge it into the utrace code.

> I changed the Kprobes single-step routine along the lines you 
> suggested, but added a little extra.  See what you think.
[...]
> The test for early termination of the exception handler is now back the
> way it was.  However I didn't change the test for deciding whether to 
> send a SIGTRAP.  Under the current circumstances I don't see how it 
> could ever be wrong.  (On the other hand, the code will end up calling 
> send_sigtrap() twice when a ptrace exception occurs: once in the ptrace 
> trigger routine and once in do_debug.  That won't matter will it?  I 
> would expect send_sigtrap() to be idempotent.)

Calling send_sigtrap twice during the same exception does happen to be
harmless, but I don't think it should be presumed to be.  It is just not
the right way to go about things that you send a signal twice when there
is one signal you want to generate.

Also, send_sigtrap is an i386-only function (not even x86_64 has the
same).  Only x86_64 will share this actual code, but all others will be
modelled on it.  I think it makes things simplest across the board if
the standard form is that when there is a ptrace exception, the notifier
does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP
arch code.

So, hmm.  In the old do_debug code, if a notifier returns NOTIFY_STOP,
it bails immediately, before the db6 value is saved in current->thread.
This is the normal theory of notify_die use, where NOTIFY_STOP means to
completely swallow the event as if it never happened.  In the event
there were some third party notifier involved, it ought to be able to
swallow its magic exceptions as before and have no user-visible db6
change happen at the time of that exception.  So how about this:

	get_debugreg(condition, 6);
	set_debugreg(0UL, 6);		/* The CPU does not clear it.  */

	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
					SIGTRAP) == NOTIFY_STOP)
		return;

The kprobes notifier uses max priority, so it will run first.  Its
notifier code uses my version.  For a single-step that belongs to it,
it will return NOTIFY_STOP and nothing else happens (noone touches
vdr6).  (I think I'm dredging up old territory by asking what happens
when kprobes steps over an insn that hits a data breakpoint, but I
don't recall atm.)

vdr6 belongs wholly to hw_breakpoint, no other code refers to it
directly.  hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
if it's a user-mode exception.  If it's a ptrace exception it also
sets the mapped DR_TRAPn bits.  If it's not a ptrace exception and
only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP.  If
it's a spurious exception from lazy db7 setting, hw_breakpoint just
returns NOTIFY_STOP early.

The rest of the old do_debug code stays as it is, only clear_dr7 goes.

> Are you going to the Ottawa Linux Symposium?

I am not.

> @@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
>  
>  	err = 0;
>   out:
> -	if (err && p->thread.io_bitmap_ptr) {
> +	if (err) {
> +		flush_thread_hw_breakpoint(p);
>  		kfree(p->thread.io_bitmap_ptr);
>  		p->thread.io_bitmap_max = 0;
>  	}

This can call kfree(NULL).  I would leave the original code alone, i.e.:

	if (err)
		flush_thread_hw_breakpoint(p);
	if (err && p->thread.io_bitmap_ptr) {
		kfree(p->thread.io_bitmap_ptr);
		p->thread.io_bitmap_max = 0;
	}

> +	set_debugreg(0, 7);

You'll note in my x86-64 patch changing these to 0UL.  It matters for the
asm in the set_debugreg macro that the argument have type long, not int
(which plain 0 has).

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-26 18:17                                                                   ` Roland McGrath
@ 2007-06-27  2:43                                                                     ` Alan Stern
  0 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-06-27  2:43 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 26 Jun 2007, Roland McGrath wrote:

> I needed the attached patch on top of the bptest patch for the current
> code.  Btw, that is a very nice little tester!

I had already made some of those changes (the ones needed to make 
bptest build with the new hw_breakpoint code).  I'll add in the others.

> Below that is a patch to go on top of your current patch, with x86-64
> support.  I've only tried a few trivial tests with bptest (including an
> 8-byte bp), which worked great.  It is a pretty faithful copy of your i386
> changes.  I'm still not sure we have all that right, but you might as well
> incorporate this into your patch.  You should change the x86_64 code in
> parallel with any i386 changes we decide on later, and I can test it and
> send you any typo fixups or whatnot.

Right.  I may update a few comments...

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-26 20:49                                                                   ` Roland McGrath
@ 2007-06-27  3:26                                                                     ` Alan Stern
  2007-06-27 21:04                                                                       ` Roland McGrath
  2007-06-28  3:02                                                                       ` Roland McGrath
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-06-27  3:26 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Tue, 26 Jun 2007, Roland McGrath wrote:

> > Here's the next iteration.  The arch-specific parts are now completely 
> > encapsulated.  validate_settings is in a form which should be workable 
> > on all architectures.  And the address, length, and type are passed as 
> > arguments to register_{kernel,user}_hw_breakpoint().
> 
> I like it!

Good.  My earlier stubbornness was caused by a desire to allow static
initializers, but now I see that specifying the values in the
registration call really isn't all that bad.

> > I haven't tried to modify Kconfig at all.  To do it properly would
> > require making ptrace configurable, which is not something I want to
> > tackle at the moment.
> 
> You don't need to worry about that.  Under utrace, CONFIG_PTRACE is
> already separate and can be turned off.  I don't think we need really to
> finish the Kconfig stuff at all before I merge it into the utrace code.

So far this work has all been based on the vanilla kernel.  Should I 
switch over to basing it on -mm?


> Calling send_sigtrap twice during the same exception does happen to be
> harmless, but I don't think it should be presumed to be.  It is just not
> the right way to go about things that you send a signal twice when there
> is one signal you want to generate.

What happens when there are two ptrace exceptions at different points
during the same system call?  Won't we end up sending the signal twice
no matter what?

> Also, send_sigtrap is an i386-only function (not even x86_64 has the
> same).  Only x86_64 will share this actual code, but all others will be
> modelled on it.  I think it makes things simplest across the board if
> the standard form is that when there is a ptrace exception, the notifier
> does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP
> arch code.
> 
> So, hmm.  In the old do_debug code, if a notifier returns NOTIFY_STOP,
> it bails immediately, before the db6 value is saved in current->thread.
> This is the normal theory of notify_die use, where NOTIFY_STOP means to
> completely swallow the event as if it never happened.  In the event
> there were some third party notifier involved, it ought to be able to
> swallow its magic exceptions as before and have no user-visible db6
> change happen at the time of that exception.  So how about this:
> 
> 	get_debugreg(condition, 6);
> 	set_debugreg(0UL, 6);		/* The CPU does not clear it.  */
> 
> 	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
> 					SIGTRAP) == NOTIFY_STOP)
> 		return;
> 
> The kprobes notifier uses max priority, so it will run first.  Its
> notifier code uses my version.  For a single-step that belongs to it,
> it will return NOTIFY_STOP and nothing else happens (noone touches
> vdr6).  (I think I'm dredging up old territory by asking what happens
> when kprobes steps over an insn that hits a data breakpoint, but I
> don't recall atm.)

In theory we should get an exception with both DR_STEP and DR_TRAPn 
set, meaning that neither notifier will return NOTIFY_STOP.  But if the 
kprobes handler clears DR_STEP in the DR6 image passed to the 
hw_breakpoint handler, it should work out better.

> vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> directly.  hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> if it's a user-mode exception.  If it's a ptrace exception it also
> sets the mapped DR_TRAPn bits.  If it's not a ptrace exception and
> only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP.  If
> it's a spurious exception from lazy db7 setting, hw_breakpoint just
> returns NOTIFY_STOP early.

That sounds not quite right.  To a user-space debugger, a system call
should appear as an atomic operation.  If multiple ptrace exceptions
occur during a system call, all the relevant DR_TRAPn bits should be
set in vdr6 together and all the other ones reset.  How can we arrange
that?

There's also the question of whether to send the SIGTRAP.  If
extraneous bits are set in DR6 (e.g., because the CPU always sets some
extra bits) then we will never get NOTIFY_STOP.  Nevertheless, the
signal should not always be sent.

> > @@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
> >  
> >  	err = 0;
> >   out:
> > -	if (err && p->thread.io_bitmap_ptr) {
> > +	if (err) {
> > +		flush_thread_hw_breakpoint(p);
> >  		kfree(p->thread.io_bitmap_ptr);
> >  		p->thread.io_bitmap_max = 0;
> >  	}
> 
> This can call kfree(NULL).  I would leave the original code alone, i.e.:
> 
> 	if (err)
> 		flush_thread_hw_breakpoint(p);
> 	if (err && p->thread.io_bitmap_ptr) {
> 		kfree(p->thread.io_bitmap_ptr);
> 		p->thread.io_bitmap_max = 0;
> 	}

I disagree.  kfree() is documented to return harmlessly when passed a
NULL pointer, and lots of places in the kernel have been changed to
remove useless tests for NULL before calls to kfree().  This is just
another example.

> > +	set_debugreg(0, 7);
> 
> You'll note in my x86-64 patch changing these to 0UL.  It matters for the
> asm in the set_debugreg macro that the argument have type long, not int
> (which plain 0 has).

I figured there was some reason like that.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-27  3:26                                                                     ` Alan Stern
@ 2007-06-27 21:04                                                                       ` Roland McGrath
  2007-06-29  3:00                                                                         ` Alan Stern
  2007-06-28  3:02                                                                       ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Roland McGrath @ 2007-06-27 21:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> So far this work has all been based on the vanilla kernel.  Should I 
> switch over to basing it on -mm?

It doesn't much matter at the moment.  Sticking with vanilla is the easiest
for you and me testing it right now.

> > Calling send_sigtrap twice during the same exception does happen to be
> > harmless, but I don't think it should be presumed to be.  It is just not
> > the right way to go about things that you send a signal twice when there
> > is one signal you want to generate.
> 
> What happens when there are two ptrace exceptions at different points
> during the same system call?  Won't we end up sending the signal twice
> no matter what?

Well then that is two signals for good reason, so that is a different
story.  It winds up indistinguishable from only sending the second, but as
far as the organization of the code and thinking about the semantics, twice
is right in this case and once is right in the simpler case.

> In theory we should get an exception with both DR_STEP and DR_TRAPn 
> set, meaning that neither notifier will return NOTIFY_STOP.  But if the 
> kprobes handler clears DR_STEP in the DR6 image passed to the 
> hw_breakpoint handler, it should work out better.

It's since occurred to me that kprobes can and should do:

	args->err &= ~(unsigned long) DR_STEP;
	if (args->err == 0)
		return NOTIFY_STOP;

This doesn't affect do_debug directly, but it will change the value seen by
the next notifier.  So if hw_breakpoint_handler is responsible for setting
vdr6 based on its args->err value, we should win.

> > vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> > directly.  hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> > if it's a user-mode exception.  If it's a ptrace exception it also
> > sets the mapped DR_TRAPn bits.  If it's not a ptrace exception and
> > only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP.  If
> > it's a spurious exception from lazy db7 setting, hw_breakpoint just
> > returns NOTIFY_STOP early.
> 
> That sounds not quite right.  To a user-space debugger, a system call
> should appear as an atomic operation.  If multiple ptrace exceptions
> occur during a system call, all the relevant DR_TRAPn bits should be
> set in vdr6 together and all the other ones reset.  How can we arrange
> that?

That would be nice.  But it's more than the old code did.  I don't feel any
strong need to improve the situation when using ptrace.  The old code
disabled breakpoints after the first hit, so userland would only see the
first DR_TRAPn bit.  (Even if it didn't, with the blind copying of the
hardware %db6 value, we now know it would only see one DR_TRAPn bit still
set after a second exception.)  With my suggestion above, userland would
only see the last DR_TRAPn bit.  So it's not worse.

In the ptrace case, we know it's always going to wind up with a signal
before it finishes and returns to user mode.  So one approach would be in
e.g. do_notify_resume, do:

	if (thread_info_flags & _TIF_DEBUG)
		current->thread.hw_breakpoint_info->someflag = 0;

Then ptrace_triggered could set someflag, and know from it still being set
on entry that it's a second trigger without getting back to user mode yet
(and so accumulate bits instead reset old ones).

But I just would not bother improving ptrace beyond the status quo for a
corner case noone has cared about in practice so far.  In sensible
mechanisms of the future, nothing will examine db6 values directly.

> There's also the question of whether to send the SIGTRAP.  If
> extraneous bits are set in DR6 (e.g., because the CPU always sets some
> extra bits) then we will never get NOTIFY_STOP.  Nevertheless, the
> signal should not always be sent.

Yeah.  The current Intel manual describes all the unspecified DR6 bits as
explicitly reserved and set to 1 (except 0x1000 reserved and 0).  If new
meanings are assigned in future chips, presumably those will only be
enabled by some new explicit cr/msr setting.  Those might be enabled by
some extra module or something, but there is only so much we can do to
accomodate.  I think the best plan is that notifiers should do:

	args->err &= ~bits_i_recognize_as_mine;
	if (!(args->err & known_bits))
		return NOTIFY_STOP;

known_bits are the ones we use, plus 0x8000 (DR_SWITCH/BS) and 0x2000 (BD).
(Those two should be impossible without some strange new kernel bug.)
Probably should write it as ~DR_STATUS_RESERVED, to parallel existing macros.

Then we only possibly interfere with a newfangled debug exception flavor
that occurs in the same one debug exception for an instruction also
triggering for hw_breakpoint or step.  In the more likely cases of a new
flavor of exception happening by itself, or the aforementioned strange new
kernel bugs, we will get to the bottom of do_debug and do the SIGTRAP.

For this plan, hw_breakpoint_handler also needs not to return NOTIFY_STOP
as a special case for a ptrace trigger.

> I disagree.  kfree() is documented to return harmlessly when passed a
> NULL pointer, and lots of places in the kernel have been changed to
> remove useless tests for NULL before calls to kfree().  This is just
> another example.

Ok.  I have no special opinions about that.  I just tend to avoid folding
miscellaneous changes into a patch adding new code.  It would be better
form to send first the trivial cleanup patch removing that second condition.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-27  3:26                                                                     ` Alan Stern
  2007-06-27 21:04                                                                       ` Roland McGrath
@ 2007-06-28  3:02                                                                       ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-06-28  3:02 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

I did the first crack at a powerpc port.  I'd appreciate your comments on
this patch.  It should not be incorporated, isn't finished, probably breaks
ptrace, etc.  I'm posting it now just to get any thoughts you have raised
by seeing the second machine share the intended arch-independent code.

I just translated your implementation to powerpc terms without thinking
about it much.  If you see anything that you aren't sure is right, please
tell me and don't presume there is some powerpc-specific reason it's
different.  More likely I just didn't think it through.

In the first battle just to make it compile, the only issue was that you
assume every machine has TIF_DEBUG, which is in fact an implementation
detail chosen lately by i386 and x86_64.  AFAIK the only reason for it
there is just to make a cheap test of multiple bits in the hot path
deciding to call __switch_to_xtra.  Do you rely on it meaning something
more precise than just being a shorthand for hw_breakpoint_info!=NULL?

Incidentally, I think it would be nice if kernel/hw_breakpoint.c itself had
all the #include's for everything it uses directly.  arch hw_breakpoint.c
files probably only need <asm/hw_breakpoint.h> and one or two others to
define what they need before #include "../../../kernel/hw_breakpoint.c".

The num_installed/num_kbps stuff feels a little hokey when it's really a
flag because the maximum number is one.  It seems like I could make it
tighter with some more finesse in the arch-specific hook options, so that
chbi and thbi each just store dabr, dabr!=0 means "mine gets installed",
and the switch in is just chbi->dabr?:thbi->dabr or something like that.
As we get more machines, more cleanups along these lines will probably make
sense.  (Also, before the next person not me or you tries a port, we could
use for the generic hw_breakpoint.c to get some comments at the top making
explicit what the arch file is expected to define in its types, etc.)

With just the included change to the generic code for the TIF_DEBUG, this
kind of works.  That is, it doesn't break everything else and I can use
bptest, sort of.  I didn't even try ptrace, I probably broke that.

It works enough to make clear the main new wrinkle.  On powerpc, the data
breakpoint exception is a fault before the instruction executes, not a trap
after it.  The load/store will not complete until the breakpoint is cleared.
With this patch, you can use bptest to generate a tight loop of bp0 triggers.

For ptrace compatibility, userland already expects to deal with this.  gdb
has it as per-machine implementation options how ptrace watchpoints behave,
and for powerpc it knows to remove the watchpoint, step, and reinsert it.

One approach for hw_breakpoint is just to expose in asm/hw_breakpoint.h
some standard macros saying how things behave, and caveat emptor.  But I
don't like that much.  I think things will just wind up being confused and
inadvertently unportable if the important semantics vary too much between
machines.  The point of the whole facility is to make watchpoints easy to
use, after all.

Some uses might be happy with trigger-before, but I don't see much benefit.
For writing, the trigger function can look at the memory before it's
changed.  But you could just as well have recorded the old value before
setting the breakpoint, as you have to for trigger-after--and to see both
old and new values you then need to single-step to get the new value, which
trigger-after handles with a single exception.  For reading, the trigger
function can change the memory before it's read.  But likewise, you could
just as well have changed it before setting the breakpoint--you know noone
will have read the new value until your trigger anyway.  (I have never used
a read-triggered breakpoint, so I'm rather vague on those use scenarios.)

The third machine whose manual I have handy is ia64.  It has instruction
and data breakpoints that are both trigger-before.  It has processor flags
similar to x86's RF for both, to ignore one or both breakpoint flavor for
one instruction.  That makes it cheap to continue past the breakpoint since
you don't have to clear and reset it.  But for getting new values from
data-write breakpoints, it still requires a single-step and second stop,
like powerpc.  (Incidentally ia64 has another interesting feature, which I
think the generic code accomodates nicely as an upward-compatible addition
just by changing the len arg in the register and arch_* calls to unsigned long,
and adding an arch_validate_len that can short-circuit the generic length
and alignment check.)

So, I'd like your thoughts on the whole situation.  The starting point we
can do without anything else is:

	int hw_breakpoint_triggers_before(struct hw_breakpoint *);
	int hw_breakpoint_can_resume(struct hw_breakpoint *);

or perhaps taking (unsigned int type) instead, in <asm-cpu/hw_breakpoint.h>.
i.e. for x86:

#define hw_breakpoint_triggers_before(type) ((type) == HW_BREAKPOINT_EXECUTE)
#define hw_breakpoint_can_resume(type) 	    1

and powerpc:

#define hw_breakpoint_triggers_before(any)	1
#define hw_breakpoint_can_resume(any)		0


For powerpc at least (and I figure for ia64 too) it seems easy enough to
implement disable-step-enable to turn it into trigger-after.  But it is
costly and hairy if one doesn't care.  So now I'm thinking to somewhat
follow the kprobes model, and have pre and post trigger handler options.
i.e.

	int hw_breakpoint_pre_handle_type(unsigned type);
	int hw_breakpoint_post_handle_type(unsigned type);

and in struct hw_breakpoint (replacing trigger):

	int	(*pre_handler)(struct hw_breakpoint *, struct pt_regs *);
	void	(*post_handler)(struct hw_breakpoint *, struct pt_regs *);

The pre_handler returns zero if it wants the post_handler to run.  On x86,
register would return -EINVAL if pre_handler is not NULL and type is not
EXECUTE (i.e. pre_handle_type returns false).  It also fails if
post_handler is not NULL and post_handle_type returns false, meaning the
arch code doesn't want to deal with step-over-and-trigger.

We'd still want hw_breakpoint_can_resume to tell whether you can return
from a pre_handler and continue with no a post_handler, without needing to
unregister the breakpoint.  That's true on ia64, while on powerpc you
either have to clear the breakpoint or request the post_handler stepping logic.


Thanks,
Roland


---
 arch/powerpc/kernel/Makefile        |    2 
 arch/powerpc/kernel/hw_breakpoint.c |  348 ++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/process.c       |   14 -
 arch/powerpc/kernel/ptrace-common.h |   16 -
 arch/powerpc/kernel/ptrace.c        |    2 
 arch/powerpc/kernel/ptrace32.c      |    2 
 arch/powerpc/kernel/signal_32.c     |   10 -
 arch/powerpc/kernel/signal_64.c     |    8 
 arch/powerpc/mm/fault.c             |   19 -
 include/asm-powerpc/hw_breakpoint.h |   49 +++++
 include/asm-powerpc/processor.h     |    2 
 kernel/hw_breakpoint.c              |   22 +-
 12 files changed, 438 insertions(+), 56 deletions(-)

Index: b/include/asm-powerpc/hw_breakpoint.h
===================================================================
--- /dev/null
+++ b/include/asm-powerpc/hw_breakpoint.h
@@ -0,0 +1,49 @@
+#ifndef	_ASM_POWERPC_HW_BREAKPOINT_H
+#define	_ASM_POWERPC_HW_BREAKPOINT_H
+
+/*
+ * The only available size of data breakpoint is 8.
+ */
+#define HW_BREAKPOINT_LEN_8	0x0d00dbe8
+
+/*
+ * Available HW breakpoint type encodings.
+ */
+#define HW_BREAKPOINT_READ	0x0dab0005	/* trigger on memory read */
+#define HW_BREAKPOINT_WRITE	0x0dab0006	/* trigger on memory write */
+#define HW_BREAKPOINT_RW	0x0dab0007	/* ... on read or write */
+
+
+struct arch_hw_breakpoint {
+	/*
+	 * High bits are aligned address, low 3 bits are flags.
+	 */
+	unsigned long dabr;
+};
+
+#define	__ARCH_HW_BREAKPOINT_H
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+	return (const void *) (bp->info.dabr &~ 7UL);
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+		struct hw_breakpoint *bp)
+{
+	return (const void __user *) (bp->info.dabr &~ 7UL);
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+	return HW_BREAKPOINT_LEN_8;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+	return (bp->info.dabr & 7UL) | 0x0dab0000;
+}
+
+#endif	/* _ASM_POWERPC_HW_BREAKPOINT_H */
Index: b/arch/powerpc/kernel/Makefile
===================================================================
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -75,6 +75,8 @@ obj-$(CONFIG_KEXEC)		+= machine_kexec.o 
 obj-$(CONFIG_AUDIT)		+= audit.o
 obj64-$(CONFIG_AUDIT)		+= compat_audit.o
 
+obj64-y				+= hw_breakpoint.o
+
 ifneq ($(CONFIG_PPC_INDIRECT_IO),y)
 obj-y				+= iomap.o
 endif
Index: b/arch/powerpc/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -0,0 +1,348 @@
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/hw_breakpoint.h>
+
+#define HB_NUM	1
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+	struct list_head	node;		/* Entry in thread list */
+	struct list_head	thread_bps;	/* Thread's breakpoints */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Highest-priority bps */
+	int			num_installed;	/* Number of installed bps */
+	unsigned		gennum;		/* update-generation number */
+
+	struct hw_breakpoint	ptrace_bp;
+
+	unsigned long		dabr;		/* Value switched in */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+	unsigned		gennum;		/* Generation number */
+	struct hw_breakpoint	*bps[HB_NUM];	/* Loaded breakpoint */
+	int			num_kbps;
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+	struct kernel_bp_data	*cur_kbpdata;	/* Current kbpdata[] entry */
+	struct task_struct	*bp_task;	/* The thread whose bps
+			are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data	kbpdata[2];	/* Old and new settings */
+static int			cur_kbpindex;	/* Alternates 0, 1, ... */
+static struct kernel_bp_data	*cur_kbpdata = &kbpdata[0];
+			/* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8			tprio[HB_NUM];	/* Thread bp max priorities */
+static LIST_HEAD(kernel_bps);			/* Kernel breakpoint list */
+static LIST_HEAD(thread_list);			/* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex);	/* Protects everything */
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+	/*
+	 * Don't allow debug exceptions while we update the DABR.
+	 */
+	set_dabr(0);
+
+	chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+	if (chbi->cur_kbpdata->num_kbps)
+		set_dabr(chbi->cur_kbpdata->bps[0]->info.dabr);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+			     struct kernel_bp_data *thr_kbpdata)
+{
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+	if (thbi->dabr)
+		set_dabr(thbi->dabr);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+}
+
+/*
+ * Store a thread breakpoint array entry's address
+ */
+static void arch_store_thread_bp_array(struct thread_hw_breakpoint *thbi,
+				       struct hw_breakpoint *bp, int i)
+{
+	thbi->dabr = bp->info.dabr;
+}
+
+#define TASK_SIZE_OF(tsk) \
+	(test_tsk_thread_flag(tsk, TIF_32BIT) \
+	 ? TASK_SIZE_USER32 : TASK_SIZE_USER64)
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va, struct task_struct *tsk)
+{
+	return va < TASK_SIZE_OF(tsk);
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+	return va >= TASK_SIZE_USER64;
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static void arch_store_info(struct hw_breakpoint *bp,
+			    unsigned long address, unsigned len, unsigned type)
+{
+	BUG_ON(address & 7UL);
+	BUG_ON(!(type & DABR_TRANSLATION));
+	bp->info.dabr = address | (type & 7UL);
+}
+
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+					     struct thread_hw_breakpoint *thbi)
+{
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+		struct thread_hw_breakpoint *thbi)
+{
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+	struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+					 struct hw_breakpoint *bp,
+					 unsigned long address,
+					 unsigned len, unsigned type);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+					    struct hw_breakpoint *bp);
+
+/*
+ * This is a placeholder that never gets called.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+	BUG();
+}
+
+unsigned long thread_get_dabr(struct task_struct *tsk)
+{
+	if (tsk->thread.hw_breakpoint_info)
+		return tsk->thread.hw_breakpoint_info->dabr;
+	return 0;
+}
+
+int thread_set_dabr(struct task_struct *tsk, unsigned long val)
+{
+	unsigned long addr = val &~ 7UL;
+	unsigned int type = 0x0dab0000 | (val & 7UL);
+
+	struct thread_hw_breakpoint *thbi;
+	int rc = -EIO;
+
+	/* We have to hold this lock the entire time, to prevent thbi
+	 * from being deallocated out from under us.
+	 */
+	mutex_lock(&hw_breakpoint_mutex);
+
+	if (!tsk->thread.hw_breakpoint_info && val == 0)
+		rc = 0;		/* Minor optimization */
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+		rc = -ENOMEM;
+	else {
+		struct hw_breakpoint *bp = &thbi->ptrace_bp;
+
+		/*
+		 * If the breakpoint is registered then unregister it,
+		 * change it, and re-register it.  Revert to the original
+		 * address if an error occurs.
+		 */
+		if (bp->status) {
+			unsigned long old_dabr = bp->info.dabr;
+
+			__unregister_user_hw_breakpoint(tsk, bp);
+			if (val != 0) {
+				rc = __register_user_hw_breakpoint(
+					tsk, bp, addr,
+					HW_BREAKPOINT_LEN_8, type);
+				if (rc < 0)
+					__register_user_hw_breakpoint(
+						tsk, bp,
+						old_dabr &~ 7UL,
+						HW_BREAKPOINT_LEN_8,
+						0x0dab0000 | (old_dabr & 7UL));
+			}
+		} else if (val != 0) {
+			bp->triggered = ptrace_triggered;
+			bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+			rc = __register_user_hw_breakpoint(
+				tsk, bp, addr,
+				HW_BREAKPOINT_LEN_8, type);
+		}
+	}
+
+	mutex_unlock(&hw_breakpoint_mutex);
+	return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int hw_breakpoint_handler(struct die_args *args)
+{
+	struct cpu_hw_breakpoint *chbi;
+	struct hw_breakpoint *bp;
+	struct thread_hw_breakpoint *thbi = NULL;
+	int ret;
+
+	/* Assert that local interrupts are disabled */
+
+	/* Are we a victim of lazy debug-register switching? */
+	chbi = &per_cpu(cpu_info, get_cpu());
+	if (!chbi->bp_task)
+		;
+	else if (chbi->bp_task != current) {
+
+		/* No user breakpoints are valid.  Perform the belated
+		 * debug-register switch.
+		 */
+		switch_to_none_hw_breakpoint();
+	} else {
+		thbi = chbi->bp_task->thread.hw_breakpoint_info;
+	}
+
+	/*
+	 * Disable all breakpoints so that the callbacks can run without
+	 * triggering recursive debug exceptions.
+	 */
+	set_dabr(0);
+
+	bp = chbi->cur_kbpdata->bps[0] ?: thbi->bps[0];
+	ret = NOTIFY_STOP;
+	if (bp == &thbi->ptrace_bp)
+		ret = NOTIFY_DONE;
+	else
+		(*bp->triggered)(bp, args->regs);
+
+	/* Re-enable the breakpoints */
+	set_dabr(thbi ? thbi->dabr : chbi->cur_kbpdata->bps[0]->info.dabr);
+	put_cpu_no_resched();
+
+	return NOTIFY_STOP;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int hw_breakpoint_exceptions_notify(
+	struct notifier_block *unused, unsigned long val, void *data)
+{
+	if (val != DIE_DABR_MATCH)
+		return NOTIFY_DONE;
+	return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+	.notifier_call = hw_breakpoint_exceptions_notify
+};
+
+void load_debug_registers(void);
+
+static int __init init_hw_breakpoint(void)
+{
+	printk(KERN_EMERG "hw_breakpoint initializing\n");
+	load_debug_registers();
+	return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: b/arch/powerpc/kernel/ptrace-common.h
===================================================================
--- a/arch/powerpc/kernel/ptrace-common.h
+++ b/arch/powerpc/kernel/ptrace-common.h
@@ -139,6 +139,10 @@ static inline int set_vrregs(struct task
 }
 #endif
 
+#ifdef CONFIG_PPC64
+unsigned long thread_get_dabr(struct task_struct *tsk);
+int thread_set_dabr(struct task_struct *tsk, unsigned long val);
+
 static inline int ptrace_set_debugreg(struct task_struct *task,
 				      unsigned long addr, unsigned long data)
 {
@@ -146,16 +150,8 @@ static inline int ptrace_set_debugreg(st
 	if (addr > 0)
 		return -EINVAL;
 
-	/* The bottom 3 bits are flags */
-	if ((data & ~0x7UL) >= TASK_SIZE)
-		return -EIO;
-
-	/* Ensure translation is on */
-	if (data && !(data & DABR_TRANSLATION))
-		return -EIO;
-
-	task->thread.dabr = data;
-	return 0;
+	return thread_set_dabr(task, data);
 }
+#endif
 
 #endif /* _PPC64_PTRACE_COMMON_H */
Index: b/arch/powerpc/kernel/ptrace.c
===================================================================
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -390,7 +390,7 @@ long arch_ptrace(struct task_struct *chi
 		/* We only support one DABR and no IABRS at the moment */
 		if (addr > 0)
 			break;
-		ret = put_user(child->thread.dabr,
+		ret = put_user(thread_get_dabr(child),
 			       (unsigned long __user *)data);
 		break;
 	}
Index: b/arch/powerpc/kernel/ptrace32.c
===================================================================
--- a/arch/powerpc/kernel/ptrace32.c
+++ b/arch/powerpc/kernel/ptrace32.c
@@ -330,7 +330,7 @@ long compat_sys_ptrace(int request, int 
 		/* We only support one DABR and no IABRS at the moment */
 		if (addr > 0)
 			break;
-		ret = put_user(child->thread.dabr, (u32 __user *)data);
+		ret = put_user((u32)thread_get_dabr(child), (u32 __user *)data);
 		break;
 	}
 
Index: b/arch/powerpc/kernel/signal_32.c
===================================================================
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1197,16 +1197,6 @@ no_signal:
 		newsp = regs->gpr[1];
 	newsp &= ~0xfUL;
 
-#ifdef CONFIG_PPC64
-	/*
-	 * Reenable the DABR before delivering the signal to
-	 * user space. The DABR will have been cleared if it
-	 * triggered inside the kernel.
-	 */
-	if (current->thread.dabr)
-		set_dabr(current->thread.dabr);
-#endif
-
 	/* Whee!  Actually deliver the signal.  */
 	if (ka.sa.sa_flags & SA_SIGINFO)
 		ret = handle_rt_signal(signr, &ka, &info, oldset, regs, newsp);
Index: b/arch/powerpc/kernel/signal_64.c
===================================================================
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -529,14 +529,6 @@ int do_signal(sigset_t *oldset, struct p
 		if (TRAP(regs) == 0x0C00)
 			syscall_restart(regs, &ka);
 
-		/*
-		 * Reenable the DABR before delivering the signal to
-		 * user space. The DABR will have been cleared if it
-		 * triggered inside the kernel.
-		 */
-		if (current->thread.dabr)
-			set_dabr(current->thread.dabr);
-
 		ret = handle_signal(signr, &ka, &info, oldset, regs);
 
 		/* If a signal was successfully delivered, the saved sigmask is in
Index: b/arch/powerpc/mm/fault.c
===================================================================
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -113,9 +113,6 @@ static void do_dabr(struct pt_regs *regs
 	if (debugger_dabr_match(regs))
 		return;
 
-	/* Clear the DABR */
-	set_dabr(0);
-
 	/* Deliver the signal to userspace */
 	info.si_signo = SIGTRAP;
 	info.si_errno = 0;
@@ -164,6 +161,14 @@ int __kprobes do_page_fault(struct pt_re
 	is_write = error_code & ESR_DST;
 #endif /* CONFIG_4xx || CONFIG_BOOKE */
 
+#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
+  	if (error_code & DSISR_DABRMATCH) {
+		/* DABR match */
+		do_dabr(regs, address, error_code);
+		return 0;
+	}
+#endif /* !(CONFIG_4xx || CONFIG_BOOKE)*/
+
 	if (notify_page_fault(regs))
 		return 0;
 
@@ -176,14 +181,6 @@ int __kprobes do_page_fault(struct pt_re
 	if (!user_mode(regs) && (address >= TASK_SIZE))
 		return SIGSEGV;
 
-#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
-  	if (error_code & DSISR_DABRMATCH) {
-		/* DABR match */
-		do_dabr(regs, address, error_code);
-		return 0;
-	}
-#endif /* !(CONFIG_4xx || CONFIG_BOOKE)*/
-
 	if (in_atomic() || mm == NULL) {
 		if (!user_mode(regs))
 			return SIGSEGV;
Index: b/include/asm-powerpc/processor.h
===================================================================
--- a/include/asm-powerpc/processor.h
+++ b/include/asm-powerpc/processor.h
@@ -149,8 +149,8 @@ struct thread_struct {
 #ifdef CONFIG_PPC64
 	unsigned long	start_tb;	/* Start purr when proc switched in */
 	unsigned long	accum_tb;	/* Total accumilated purr for process */
+	struct thread_hw_breakpoint *hw_breakpoint_info;
 #endif
-	unsigned long	dabr;		/* Data address breakpoint register */
 #ifdef CONFIG_ALTIVEC
 	/* Complete AltiVec register set */
 	vector128	vr[32] __attribute((aligned(16)));
Index: b/arch/powerpc/kernel/process.c
===================================================================
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -46,6 +46,7 @@
 #include <asm/syscalls.h>
 #ifdef CONFIG_PPC64
 #include <asm/firmware.h>
+#include <asm/hw_breakpoint.h>
 #endif
 
 extern unsigned long _get_SP(void);
@@ -232,7 +233,6 @@ int set_dabr(unsigned long dabr)
 
 #ifdef CONFIG_PPC64
 DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
-static DEFINE_PER_CPU(unsigned long, current_dabr);
 #endif
 
 struct task_struct *__switch_to(struct task_struct *prev,
@@ -300,10 +300,8 @@ struct task_struct *__switch_to(struct t
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PPC64	/* for now */
-	if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) {
-		set_dabr(new->thread.dabr);
-		__get_cpu_var(current_dabr) = new->thread.dabr;
-	}
+	if (unlikely(new->thread.hw_breakpoint_info != NULL))
+		switch_to_thread_hw_breakpoint(new);
 #endif /* CONFIG_PPC64 */
 
 	new_thread = &new->thread;
@@ -474,10 +472,8 @@ void flush_thread(void)
 	discard_lazy_cpu_state();
 
 #ifdef CONFIG_PPC64	/* for now */
-	if (current->thread.dabr) {
-		current->thread.dabr = 0;
-		set_dabr(0);
-	}
+	if (unlikely(current->thread.hw_breakpoint_info))
+		flush_thread_hw_breakpoint(current);
 #endif
 }
 
Index: b/kernel/hw_breakpoint.c
===================================================================
--- a/kernel/hw_breakpoint.c
+++ b/kernel/hw_breakpoint.c
@@ -25,6 +25,18 @@
  * #include'd by the arch-specific implementation.
  */
 
+#include <asm/thread_info.h>
+
+#ifdef TIF_DEBUG
+#define clear_tsk_debug_flag(tsk)	clear_tsk_thread_flag(tsk, TIF_DEBUG)
+#define set_tsk_debug_flag(tsk)		set_tsk_thread_flag(tsk, TIF_DEBUG)
+#define test_tsk_debug_flag(tsk)	test_tsk_thread_flag(tsk, TIF_DEBUG)
+#else
+#define clear_tsk_debug_flag(tsk)	do { } while (0)
+#define set_tsk_debug_flag(tsk)		do { } while (0)
+#define test_tsk_debug_flag(tsk)	\
+	((tsk)->thread.hw_breakpoint_info != NULL)
+#endif
 
 /*
  * Install the debug register values for a new thread.
@@ -156,7 +168,7 @@ static void update_this_cpu(void *unused
 
 	/* Install both the kernel and the user breakpoints */
 	arch_install_chbi(chbi);
-	if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+	if (test_tsk_debug_flag(tsk))
 		switch_to_thread_hw_breakpoint(tsk);
 
 	put_cpu_no_resched();
@@ -369,7 +381,7 @@ void flush_thread_hw_breakpoint(struct t
 	list_del(&thbi->node);
 
 	/* The thread no longer has any breakpoints associated with it */
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+	clear_tsk_debug_flag(tsk);
 	tsk->thread.hw_breakpoint_info = NULL;
 	kfree(thbi);
 
@@ -393,7 +405,7 @@ int copy_thread_hw_breakpoint(struct tas
 	 * and the child starts out with no debug registers set.
 	 * But what about CLONE_PTRACE?
 	 */
-	clear_tsk_thread_flag(child, TIF_DEBUG);
+	clear_tsk_debug_flag(child);
 	return 0;
 }
 
@@ -457,7 +469,7 @@ static int insert_bp_in_list(struct hw_b
 
 		/* Is this the thread's first registered breakpoint? */
 		if (list_empty(&thbi->node)) {
-			set_tsk_thread_flag(tsk, TIF_DEBUG);
+			set_tsk_debug_flag(tsk);
 			list_add(&thbi->node, &thread_list);
 		}
 	}
@@ -483,7 +495,7 @@ static void remove_bp_from_list(struct h
 
 		if (list_empty(&thbi->thread_bps)) {
 			list_del_init(&thbi->node);
-			clear_tsk_thread_flag(tsk, TIF_DEBUG);
+			clear_tsk_debug_flag(tsk);
 		}
 	}
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-27 21:04                                                                       ` Roland McGrath
@ 2007-06-29  3:00                                                                         ` Alan Stern
  2007-07-11  6:59                                                                           ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-06-29  3:00 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Wed, 27 Jun 2007, Roland McGrath wrote:

> > In theory we should get an exception with both DR_STEP and DR_TRAPn 
> > set, meaning that neither notifier will return NOTIFY_STOP.  But if the 
> > kprobes handler clears DR_STEP in the DR6 image passed to the 
> > hw_breakpoint handler, it should work out better.
> 
> It's since occurred to me that kprobes can and should do:
> 
> 	args->err &= ~(unsigned long) DR_STEP;
> 	if (args->err == 0)
> 		return NOTIFY_STOP;
> 
> This doesn't affect do_debug directly, but it will change the value seen by
> the next notifier.  So if hw_breakpoint_handler is responsible for setting
> vdr6 based on its args->err value, we should win.

Exactly what I had in mind.

> > > vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> > > directly.  hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> > > if it's a user-mode exception.  If it's a ptrace exception it also
> > > sets the mapped DR_TRAPn bits.  If it's not a ptrace exception and
> > > only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP.  If
> > > it's a spurious exception from lazy db7 setting, hw_breakpoint just
> > > returns NOTIFY_STOP early.
> > 
> > That sounds not quite right.  To a user-space debugger, a system call
> > should appear as an atomic operation.  If multiple ptrace exceptions
> > occur during a system call, all the relevant DR_TRAPn bits should be
> > set in vdr6 together and all the other ones reset.  How can we arrange
> > that?
> 
> That would be nice.  But it's more than the old code did.  I don't feel any
> strong need to improve the situation when using ptrace.  The old code
> disabled breakpoints after the first hit, so userland would only see the
> first DR_TRAPn bit.  (Even if it didn't, with the blind copying of the
> hardware %db6 value, we now know it would only see one DR_TRAPn bit still
> set after a second exception.)  With my suggestion above, userland would
> only see the last DR_TRAPn bit.  So it's not worse.
> 
> In the ptrace case, we know it's always going to wind up with a signal
> before it finishes and returns to user mode.  So one approach would be in
> e.g. do_notify_resume, do:
> 
> 	if (thread_info_flags & _TIF_DEBUG)
> 		current->thread.hw_breakpoint_info->someflag = 0;
> 
> Then ptrace_triggered could set someflag, and know from it still being set
> on entry that it's a second trigger without getting back to user mode yet
> (and so accumulate bits instead reset old ones).
> 
> But I just would not bother improving ptrace beyond the status quo for a
> corner case noone has cared about in practice so far.  In sensible
> mechanisms of the future, nothing will examine db6 values directly.

Come to think of it, I believe that gdb doesn't check beyond the first 
DR_TRAPn bit it finds set.  I can live with reporting only the last 
hit.

> > There's also the question of whether to send the SIGTRAP.  If
> > extraneous bits are set in DR6 (e.g., because the CPU always sets some
> > extra bits) then we will never get NOTIFY_STOP.  Nevertheless, the
> > signal should not always be sent.
> 
> Yeah.  The current Intel manual describes all the unspecified DR6 bits as
> explicitly reserved and set to 1 (except 0x1000 reserved and 0).  If new
> meanings are assigned in future chips, presumably those will only be
> enabled by some new explicit cr/msr setting.  Those might be enabled by
> some extra module or something, but there is only so much we can do to
> accomodate.  I think the best plan is that notifiers should do:
> 
> 	args->err &= ~bits_i_recognize_as_mine;
> 	if (!(args->err & known_bits))
> 		return NOTIFY_STOP;
> 
> known_bits are the ones we use, plus 0x8000 (DR_SWITCH/BS) and 0x2000 (BD).
> (Those two should be impossible without some strange new kernel bug.)
> Probably should write it as ~DR_STATUS_RESERVED, to parallel existing macros.
> 
> Then we only possibly interfere with a newfangled debug exception flavor
> that occurs in the same one debug exception for an instruction also
> triggering for hw_breakpoint or step.  In the more likely cases of a new
> flavor of exception happening by itself, or the aforementioned strange new
> kernel bugs, we will get to the bottom of do_debug and do the SIGTRAP.
> 
> For this plan, hw_breakpoint_handler also needs not to return NOTIFY_STOP
> as a special case for a ptrace trigger.

That should work well.  But how does the handler know whether a ptrace
trigger occurred?  I can think of several possible ways, none of them
very attractive.  Simply checking the vdr6 value might not work.  The
simplest approach would be to see if the trigger callback address is
equal to ptrace_triggered -- it's a hack but it is reliable.

For that matter, knowing when to set vdr6 is a little tricky.  I guess
it should be set whenever a debug exception occurs in user mode (which
includes both breakpoints and single-step events).  But what about
ptrace triggers while the CPU is in kernel mode?  Should they set the
four DR_TRAPn bits in vdr6 and leave the rest alone?

> > I disagree.  kfree() is documented to return harmlessly when passed a
> > NULL pointer, and lots of places in the kernel have been changed to
> > remove useless tests for NULL before calls to kfree().  This is just
> > another example.
> 
> Ok.  I have no special opinions about that.  I just tend to avoid folding
> miscellaneous changes into a patch adding new code.  It would be better
> form to send first the trivial cleanup patch removing that second condition.

Sounds reasonable.  I'll split it out.

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
  2007-06-29  3:00                                                                         ` Alan Stern
@ 2007-07-11  6:59                                                                           ` Roland McGrath
  0 siblings, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-07-11  6:59 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> That should work well.  But how does the handler know whether a ptrace
> trigger occurred?  I can think of several possible ways, none of them
> very attractive.  Simply checking the vdr6 value might not work.  The
> simplest approach would be to see if the trigger callback address is
> equal to ptrace_triggered -- it's a hack but it is reliable.

That's what I did in the powerpc code.  You might recall I originally
argued for not using a regular hw_breakpoint struct and callback for ptrace
at all.  (I still think it could wind up pretty clean and tight to have it
purely a special case using its own data structures without a struct
hw_breakpoint.  Only the priority stuff has to do something special to
treat ptrace-in-use as a registration with the right priority.)

> For that matter, knowing when to set vdr6 is a little tricky.  I guess
> it should be set whenever a debug exception occurs in user mode (which
> includes both breakpoints and single-step events).  But what about
> ptrace triggers while the CPU is in kernel mode?  Should they set the
> four DR_TRAPn bits in vdr6 and leave the rest alone?

When do_debug sets TIF_SINGLESTEP, it will lead to a SIGTRAP on the way
back to user mode.  The idea is that it should appear to user mode like the
syscall was any hardware instruction that got the step trap.  So it follows
(and matches existing behavior) to set DR_STEP in vdr6 in this case too.

> I don't.  I used TIF_DEBUG because it was already there and it was 
> atomic.  But setting hw_breakpoint_info is equally atomic, so there's 
> no reason to keep TIF_DEBUG.

I would not remove TIF_DEBUG where it exists now.  It was added on x86 and
x86_64 as an optimization so the common case tests and decides not to call
__switch_to_xtra with one instruction.  Don't lose that optimization.

> > The num_installed/num_kbps stuff feels a little hokey when it's really a
> > flag because the maximum number is one.  It seems like I could make it
> > tighter with some more finesse in the arch-specific hook options, so that
> > chbi and thbi each just store dabr, dabr!=0 means "mine gets installed",
> > and the switch in is just chbi->dabr?:thbi->dabr or something like that.
> 
> You certainly can do that in the hook routines.  But the generic code
> still needs to use num_installed (which doesn't get used very much) and
> num_kbps.

What I meant is using some arch hooks instead of those fields in the
generic code.  On machines where there is a count to keep, they would just
be trivial accessors (could be one-line macros).  On powerpc, they would be
implemented slightly differently and return 1 or 0.

> > Some uses might be happy with trigger-before, but I don't see much benefit.
> 
> Other than ptrace backward-compatibility.

Right, I wasn't suggesting losing that.

> I never have either.  Possibly you might want to change the value just 
> before the read, based on the address of the code doing the reading.  
> But I've never heard of anyone doing that.

Ah, that's a thought.  Ok.  I was already tending towards flexibility to
let someone do that if they wanted to (on trigger-before machines).

> > 	int hw_breakpoint_triggers_before(struct hw_breakpoint *);
> > 	int hw_breakpoint_can_resume(struct hw_breakpoint *);
> > 
> > or perhaps taking (unsigned int type) instead, in <asm-cpu/hw_breakpoint.h>.
> > i.e. for x86:
> > 
> > #define hw_breakpoint_triggers_before(type) ((type) == HW_BREAKPOINT_EXECUTE)
> > #define hw_breakpoint_can_resume(type) 	    1
> > 
> > and powerpc:
> > 
> > #define hw_breakpoint_triggers_before(any)	1
> > #define hw_breakpoint_can_resume(any)		0
> 
> I prefer the second alternative.  For the first, you'd have to register 
> the breakpoint before knowing how it will behave!

Yes, I was sort of thinking it up while I typed there.

> In general that sounds good.  But do we really want the register call
> to fail if extra handlers are defined?  That approach makes portable
> drivers harder to write.  Maybe it would be better to fail only if all
> of the arch-supported handler alternatives are NULL.

My rationale is that judiciously rejecting impossible settings in fact
makes it easier to write (correct) portable drivers.  If you set a callback
function that will never be called, you are confused and are going to have
the logic go wrong in your code.  If you can't get started while under the
delusion that your function is going to be called, then you won't waste all
that time on subtle debugging trying to figure out why it's not getting called.

> > We'd still want hw_breakpoint_can_resume to tell whether you can return
> > from a pre_handler and continue with no a post_handler, without needing to
> > unregister the breakpoint.  That's true on ia64, while on powerpc you
> > either have to clear the breakpoint or request the post_handler stepping logic.
> 
> Unregistering the breakpoint isn't good on SMP systems, since it would
> be unregistered on all CPUs.  I think it would better to require all
> arch's to support the stepping logic.

I don't disagree.  But the point is that on ia64 there is a case where no
stepping is required, so there is no reason the arch hooks shouldn't
indicate to users that this usage pattern is available.  Also,
realistically the single-step to post-handler part of the implementation
will come last for each arch, and flexible users can do interesting things
with partial support if they have the information on what the
implementation supports.

> Going over the code, I remembered that TIF_DEBUG really does mean moree
> than just hw_breakpoint_info != NULL.  It means that the thread
> actually has some breakpoints registered.

Ok.

> Why keep the hw_breakpoint_info structure if there are no registered 
> breakpoints?  I did it so that the virtualized DR[0-3] values would 
> remain intact.

Ok.  Whenever all the virtual bits are zero you can free it.  That is
probably worth doing for the case when ptrace is never used, but some other
exciting new facility comes in and uses watchpoints for a while and then
goes away.

> For other processors that have only one debug register, this won't matter
> so much.  But of course there are references to TIF_DEBUG in the
> arch-independent code.  Do you think there would be any problem about
> reserving a bit for TIF_DEBUG in the other architectures?

In my powerpc patch I made those conditional and that seems fine.  I think
that having a TIF_DEBUG is an arch-specific choice, and each arch should
decide whether it is advantageous.  

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-02-06 19:58 ` [PATCH] Kwatch: kernel watchpoints using CPU debug registers Alan Stern
@ 2007-02-07  2:56   ` Roland McGrath
  0 siblings, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-02-07  2:56 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

> That's good.  So I'll assume an updated version of kwatch can be submitted 
> without regard to the progress of utrace (other than minor conflicts over 
> the exact location of the ptrace code to change).

Indeed.

> Right.  I had been thinking in terms of a developer using kwatch to track 
> down some particularly nasty problem, something that would happen rather 
> infrequently, where one wouldn't care about side effects on user programs.  
> But of course those side effects might alter an important aspect of the 
> kernel problem being debugged...

This is indeed a way it might reasonably be used.  As I said, it's fine for
an individual use to be that way.  But think also of using it for
performance measurement (i.e. "how hot is this counter") in something like
systemtap, where you might have long-running instrumentation over arbitrary
workloads.

> It's also true that the current kwatch version affects the user experience
> even when no kernel debugging is going on, as it forcibly prevents ptrace
> calls from setting the Global-Enable bits in dr7.  That at least can be
> fixed quite easily.  (On the other hand, userspace should never do 
> anything other than a Local Enable.)

The distinction between local and global here never matters on Linux.  We
don't use hardware task switching at all, and if we did it would be part of
context switch, which already switches in debug register values.  

The local vs global distinction you have in debugreg allocation (when one
Linux task_struct is on the CPU vs always on every CPU) is a
machine-independent notion at the level of your debugreg sharing
abstraction, and has nothing to do with particular %dr7 bit values
(just with the allocation of all the bits in %dr7 that correspond to a
particular allocated %drN).

> How about a pair of callbacks: One to notify whenever the watchpoint is 
> enabled and one to notify whenever it is disabled?

That sounds fine.  You'll want to make sure it's structured so it doesn't
get too hairy when a caller wants to just give up and unregister when its
slot is unavailable (hopefully shouldn't lead to calling unregister from
the callback made inside the register call and such twists).

> So for the sake of argument, let's assume that debug registers can be 
> assigned with priority values ranging from 0 to 7 (overkill, but who 
> cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
> can request whatever priority they like.  The well-behaved cases you've 
> been discussing will use priority 0, and the invasive cases can use 
> priority 7.  (With appropriate symbolic names instead of raw numeric 
> values, naturally.)

Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
average bear use 0 or something especially unobtrusive use >0, and
something very obtrusive use -many.  Unless you are really going to pack it
into a few bits somewhere, I'd make it an arbitrary int rather than a
special small range; it's just for sort order comparison.  Bottom line, I
don't really care about the numerology.  Just so "break ptrace", "don't
break ptrace", and "readily get out of the way on demand" can be expressed.
We can always fine-tune it later as there are more concrete users.

> Or maybe that's too complicated.  Perhaps all userspace assignments should 
> always use the same priority level.  

No, I want priorities among user-mode watchpoint users too.  ptrace is
rigid, but newer facilities can coexist with ptrace on the same thread and
with kwatch, and do fancy new things to fall back when there is debugreg
allocation pressure.  Future user facilities might be able to do VM tricks
that are harder to make workable for kernel mode, for example.  

> For now I would prefer to avoid that.  It's true that kwatch is intended
> _only_ for kernelspace watchpoints, not userspace.  But I'd rather leave
> the complications up to someone else.

Understood.  If you constrain the kwatch interface so it cannot be used
with user addresses (checks < TASK_SIZE or whatever), then the problem will
be clearly defined as the slightly simpler one whenever someone does come
along in need of more complications.

> It seems likely that the interfaces added by kwatch will need to be
> generalized in various ways in order to handle the requirements of other
> architectures.  However I don't know what those requirements might be, so
> it seems best to start out small with x86 only and leave more refinements
> for the future.

Agreed, just to keep it in mind.  I think the features on other machines
are roughly similar except for not offering size choices other than
"anywhere in this aligned word".  

> If I update the patch, adding a priority level and the callback 
> notifications, do you think it would then be acceptable?

I expect so.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
       [not found] <20070206042153.66AB418005D@magilla.sf.frob.com>
@ 2007-02-06 19:58 ` Alan Stern
  2007-02-07  2:56   ` Roland McGrath
  0 siblings, 1 reply; 70+ messages in thread
From: Alan Stern @ 2007-02-06 19:58 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Prasanna S Panchamukhi, Kernel development list

On Mon, 5 Feb 2007, Roland McGrath wrote:

> Sorry I've been slow in giving you feedback on kwatch.

No problem (I have plenty of other things to work on!), and thanks for the 
detailed reply.

> > I'll be happy to move this over to the utrace setting, once it is merged.  
> > Do you think it would be better to include the current version of kwatch 
> > now or to wait for utrace?
> > 
> > Roland, is there a schedule for when you plan to get utrace into -mm?
> 
> Since you've asked, I'll mention that I've been discussing this with Andrew
> lately and we plan to work on merging it into -mm as soon as we can manage.
> 
> The kwatch implementation is pretty much orthogonal to the utrace patch as
> it is so far.  As you've noted, it doesn't change the nature of the setting
> of the debug registers; it only moves around the existing code for setting
> them in raw form.  Hence it doesn't much matter which order the work is
> merged at this stage.  There's no reason to withhold kwatch waiting for utrace.

That's good.  So I'll assume an updated version of kwatch can be submitted 
without regard to the progress of utrace (other than minor conflicts over 
the exact location of the ptrace code to change).

> I do have a problem with kwatch, however.  The existing user ABI includes
> setting all of the debug registers, and this interface has never before
> expressed a situation where you can set some but not all of them.  Having
> ptrace suddenly fail with EBUSY when it never did before is not OK.  No
> well-behaved kernel-mode tracing/debugging facility should perturb the user
> experience in this way.  It is certainly understandable that one will
> sometimes want to do invasive kernel-mode debugging and on special
> occasions choose to be ill-behaved in this way (you might know your
> userland work load doesn't include running gdb with watchpoints).  
> But kwatch as it stands does not even make it possible to write a
> well-behaved facility.

Right.  I had been thinking in terms of a developer using kwatch to track 
down some particularly nasty problem, something that would happen rather 
infrequently, where one wouldn't care about side effects on user programs.  
But of course those side effects might alter an important aspect of the 
kernel problem being debugged...

It's also true that the current kwatch version affects the user experience
even when no kernel debugging is going on, as it forcibly prevents ptrace
calls from setting the Global-Enable bits in dr7.  That at least can be
fixed quite easily.  (On the other hand, userspace should never do 
anything other than a Local Enable.)

> I am all in favor of a facility to manage shared use of the debug
> registers, such as your debugreg.h additions.  I just think it needs to be
> a little more flexible.  An unobtrusive kernel facility has to get out of
> the way when user-mode decides to use all its debug registers.  It's not
> immediately important what it's going to about it when contention arises,
> but there has to be a way for the user-mode facilities to say they need to
> allocate debugregs with priority and evict other squatters.  So, something
> like code allocating a debugreg can supply a callback that's made when its
> allocation has to taken by something with higher priority.  

How about a pair of callbacks: One to notify whenever the watchpoint is 
enabled and one to notify whenever it is disabled?

> Even after utrace, there will always be the possibility of a traditional
> uncoordinated user of the raw debug registers, if nothing else ptrace
> compatibility will always be there for old users.  So anything new and
> fancy needs to be prepared to back out of the way gracefully.  In the case
> of kwatch, it can just have a handler function given by the caller to start
> with.  It's OK if individual callers can specially declare "I am not
> well-behaved" and eat debugregs so that well-behaved high-priority users
> like ptrace just have to lose (breaking compatibility).  But no
> well-behaved caller of kwatch will do that.  

No doubt the future userspace API will include some sort of priority 
facility.  For now, though, ptrace doesn't have anything like it.  We just 
have to assign it an arbitrary intermediate priority.

So for the sake of argument, let's assume that debug registers can be 
assigned with priority values ranging from 0 to 7 (overkill, but who 
cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
can request whatever priority they like.  The well-behaved cases you've 
been discussing will use priority 0, and the invasive cases can use 
priority 7.  (With appropriate symbolic names instead of raw numeric 
values, naturally.)

Or maybe that's too complicated.  Perhaps all userspace assignments should 
always use the same priority level.  After all, it's possible for multiple 
tasks to allocate the same debug register at the same time -- if they had 
differing priorities that would make it much more difficult to keep things 
straight.  Then there would be only three effective priority levels: 0 = 
well-behaved kernel, 1 = all userspace, and 2 = invasive kernel.

> As a later improvement, kwatch could try a thing or two to stave off giving
> up and telling its caller the watchpoint couldn't stay for the current
> task.  For example, if a watchpoint is in kernel memory, you could switch
> in your debugreg settings on entering the kernel and restore the user
> watchpoints before returning to user mode.  Then you'd need to make
> get_user et al somehow observe the user-mode watchpoints.  But it could be
> investigated if the need arises.

For now I would prefer to avoid that.  It's true that kwatch is intended
_only_ for kernelspace watchpoints, not userspace.  But I'd rather leave
the complications up to someone else.

>  Note that you can already silently do
> something simple like juggling your kwatch debugreg assignments around if
> the higher-priority consumer evicting you has left some other debugregs unused.

Yes, I might add that in.

> I certainly intend for later features based on utrace to include
> higher-level treatment of watchpoints so that user debugging facilities can
> also become responsive to debugreg allocation pressure.  (Eventually, the
> user facilities might have easier ways of falling back to other methods and
> getting out of the way of kernel debugreg consumers, than can be done for
> the kernel-mode-tracing facilities.)  To that end, I'd like to see a clear
> and robust interface for debugreg sharing, below the level of kwatch.  I'd
> also like to see a thin layer on that giving a machine-independent kernel
> source API for talking about watchpoints, which you pretty much have rolled
> into the kwatch interface now.  But these are further refinements, not
> barriers to including kwatch.

It seems likely that the interfaces added by kwatch will need to be
generalized in various ways in order to handle the requirements of other
architectures.  However I don't know what those requirements might be, so
it seems best to start out small with x86 only and leave more refinements
for the future.

> Also, an unrelated minor point.  I think it's error-prone to have an
> integer argument to unregister_kwatch.  I think it makes most sense to have
> the caller provide the space and call register/unregister with a pointer,
> in the style of kprobes.

In fact, something like that would be necessary if the debug register 
assignment could be changed silently as need arises.

If I update the patch, adding a priority level and the callback 
notifications, do you think it would then be acceptable?

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-17 16:17   ` Alan Stern
  2007-01-18  0:12     ` Christoph Hellwig
@ 2007-02-06  4:25     ` Roland McGrath
  1 sibling, 0 replies; 70+ messages in thread
From: Roland McGrath @ 2007-02-06  4:25 UTC (permalink / raw)
  To: Alan Stern; +Cc: Prasanna S Panchamukhi, Kernel development list

Sorry I've been slow in giving you feedback on kwatch.

> I'll be happy to move this over to the utrace setting, once it is merged.  
> Do you think it would be better to include the current version of kwatch 
> now or to wait for utrace?
> 
> Roland, is there a schedule for when you plan to get utrace into -mm?

Since you've asked, I'll mention that I've been discussing this with Andrew
lately and we plan to work on merging it into -mm as soon as we can manage.

The kwatch implementation is pretty much orthogonal to the utrace patch as
it is so far.  As you've noted, it doesn't change the nature of the setting
of the debug registers; it only moves around the existing code for setting
them in raw form.  Hence it doesn't much matter which order the work is
merged at this stage.  There's no reason to withhold kwatch waiting for utrace.

I do have a problem with kwatch, however.  The existing user ABI includes
setting all of the debug registers, and this interface has never before
expressed a situation where you can set some but not all of them.  Having
ptrace suddenly fail with EBUSY when it never did before is not OK.  No
well-behaved kernel-mode tracing/debugging facility should perturb the user
experience in this way.  It is certainly understandable that one will
sometimes want to do invasive kernel-mode debugging and on special
occasions choose to be ill-behaved in this way (you might know your
userland work load doesn't include running gdb with watchpoints).  
But kwatch as it stands does not even make it possible to write a
well-behaved facility.

I am all in favor of a facility to manage shared use of the debug
registers, such as your debugreg.h additions.  I just think it needs to be
a little more flexible.  An unobtrusive kernel facility has to get out of
the way when user-mode decides to use all its debug registers.  It's not
immediately important what it's going to about it when contention arises,
but there has to be a way for the user-mode facilities to say they need to
allocate debugregs with priority and evict other squatters.  So, something
like code allocating a debugreg can supply a callback that's made when its
allocation has to taken by something with higher priority.  

Even after utrace, there will always be the possibility of a traditional
uncoordinated user of the raw debug registers, if nothing else ptrace
compatibility will always be there for old users.  So anything new and
fancy needs to be prepared to back out of the way gracefully.  In the case
of kwatch, it can just have a handler function given by the caller to start
with.  It's OK if individual callers can specially declare "I am not
well-behaved" and eat debugregs so that well-behaved high-priority users
like ptrace just have to lose (breaking compatibility).  But no
well-behaved caller of kwatch will do that.  

As a later improvement, kwatch could try a thing or two to stave off giving
up and telling its caller the watchpoint couldn't stay for the current
task.  For example, if a watchpoint is in kernel memory, you could switch
in your debugreg settings on entering the kernel and restore the user
watchpoints before returning to user mode.  Then you'd need to make
get_user et al somehow observe the user-mode watchpoints.  But it could be
investigated if the need arises.  Note that you can already silently do
something simple like juggling your kwatch debugreg assignments around if
the higher-priority consumer evicting you has left some other debugregs unused.

I certainly intend for later features based on utrace to include
higher-level treatment of watchpoints so that user debugging facilities can
also become responsive to debugreg allocation pressure.  (Eventually, the
user facilities might have easier ways of falling back to other methods and
getting out of the way of kernel debugreg consumers, than can be done for
the kernel-mode-tracing facilities.)  To that end, I'd like to see a clear
and robust interface for debugreg sharing, below the level of kwatch.  I'd
also like to see a thin layer on that giving a machine-independent kernel
source API for talking about watchpoints, which you pretty much have rolled
into the kwatch interface now.  But these are further refinements, not
barriers to including kwatch.

Also, an unrelated minor point.  I think it's error-prone to have an
integer argument to unregister_kwatch.  I think it makes most sense to have
the caller provide the space and call register/unregister with a pointer,
in the style of kprobes.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-18  7:31       ` Ingo Molnar
  2007-01-18 15:37         ` Alan Stern
@ 2007-01-18 22:33         ` Christoph Hellwig
  1 sibling, 0 replies; 70+ messages in thread
From: Christoph Hellwig @ 2007-01-18 22:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Alan Stern, Andrew Morton,
	Prasanna S Panchamukhi, Kernel development list, Roland McGrath

On Thu, Jan 18, 2007 at 08:31:59AM +0100, Ingo Molnar wrote:
> 
> * Christoph Hellwig <hch@infradead.org> wrote:
> 
> > > I'll be happy to move this over to the utrace setting, once it is 
> > > merged.  Do you think it would be better to include the current 
> > > version of kwatch now or to wait for utrace?
> > > 
> > > Roland, is there a schedule for when you plan to get utrace into 
> > > -mm?
> > 
> > Even if it goes into mainline soon we'll need a lot of time for all 
> > architectures to catch up, so I think kwatch should definitely comes 
> > first.
> 
> i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
> /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
> years, precisely because it was done in the 'oh, lets get this feature 
> added first, think about it later' manner. Roland's work is a large 
> logistical undertaking and we should not make it more complex than it 
> is. Once it's in we can add debugging features ontop of that. To me work 
> that cleans up existing mess takes precedence before work that adds to 
> the mess.

Utrace doesn't provide any kind of watchpoint infrastructure now, and
utrace will take a lot of time to get ready for inclusion, mostly because
it really needs asll the arch maintainers to help out (and various not
so easy core fixes aswell).

I'm all for merging utrace, and I wish we'd be much further into the
merging process already, but blocking mostly unrelated functionality for
it is more than dumb.


> ps. please fix your mailer to not emit those silly Mail-Followup-To 
> headers! It collapses To: and Cc: lines into one huge unnecessary To: 
> line.

This header is absolutely intentation as far too many folks seem to randomly
drop to or cc lines on mailing lists.  and of course it's alsmost esential
on lists with braindead reply to list policies (e.g. Debian)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-18  7:31       ` Ingo Molnar
@ 2007-01-18 15:37         ` Alan Stern
  2007-01-18 22:33         ` Christoph Hellwig
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-01-18 15:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Andrew Morton, Prasanna S Panchamukhi,
	Kernel development list, Roland McGrath

On Thu, 18 Jan 2007, Ingo Molnar wrote:

> 
> * Christoph Hellwig <hch@infradead.org> wrote:
> 
> > > I'll be happy to move this over to the utrace setting, once it is 
> > > merged.  Do you think it would be better to include the current 
> > > version of kwatch now or to wait for utrace?
> > > 
> > > Roland, is there a schedule for when you plan to get utrace into 
> > > -mm?
> > 
> > Even if it goes into mainline soon we'll need a lot of time for all 
> > architectures to catch up, so I think kwatch should definitely comes 
> > first.
> 
> i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
> /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
> years, precisely because it was done in the 'oh, lets get this feature 
> added first, think about it later' manner. Roland's work is a large 
> logistical undertaking and we should not make it more complex than it 
> is. Once it's in we can add debugging features ontop of that. To me work 
> that cleans up existing mess takes precedence before work that adds to 
> the mess.

Interestingly, the current version of utrace makes no special provision
for watchpoints, either in kernel or user space.  Instead it relies on the
legacy ptrace mechanism for setting debug registers in the target
process's user area.  Perhaps an explicit watchpoint implementation should
be added to utrace, but that's beyond the scope of this discussion.

Furthermore, utrace is explicitly intended for tracing user programs, not
for tracing the kernel.  Kwatch, however, is just the opposite: It is
intended for setting up watchpoints in kernel space.  In that sense it is
pretty much orthogonal to utrace.  Although it would affect the utrace
patches, the changes would be basically transparent (i.e., move the new
code from one ptrace handler to another instead of moving the old code).

If Kwatch is to be subsumed anywhere, I think it should be under the
Kprobes/Systemtap project.  Again, that's a separate question -- so far 
they have avoided data watchpoints.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-18  0:12     ` Christoph Hellwig
@ 2007-01-18  7:31       ` Ingo Molnar
  2007-01-18 15:37         ` Alan Stern
  2007-01-18 22:33         ` Christoph Hellwig
  0 siblings, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2007-01-18  7:31 UTC (permalink / raw)
  To: Christoph Hellwig, Alan Stern, Andrew Morton,
	Prasanna S Panchamukhi, Kernel development list, Roland McGrath

* Christoph Hellwig <hch@infradead.org> wrote:

> > I'll be happy to move this over to the utrace setting, once it is 
> > merged.  Do you think it would be better to include the current 
> > version of kwatch now or to wait for utrace?
> > 
> > Roland, is there a schedule for when you plan to get utrace into 
> > -mm?
> 
> Even if it goes into mainline soon we'll need a lot of time for all 
> architectures to catch up, so I think kwatch should definitely comes 
> first.

i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
/huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
years, precisely because it was done in the 'oh, lets get this feature 
added first, think about it later' manner. Roland's work is a large 
logistical undertaking and we should not make it more complex than it 
is. Once it's in we can add debugging features ontop of that. To me work 
that cleans up existing mess takes precedence before work that adds to 
the mess.

	Ingo

ps. please fix your mailer to not emit those silly Mail-Followup-To 
headers! It collapses To: and Cc: lines into one huge unnecessary To: 
line.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-17 16:17   ` Alan Stern
@ 2007-01-18  0:12     ` Christoph Hellwig
  2007-01-18  7:31       ` Ingo Molnar
  2007-02-06  4:25     ` Roland McGrath
  1 sibling, 1 reply; 70+ messages in thread
From: Christoph Hellwig @ 2007-01-18  0:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Andrew Morton, Prasanna S Panchamukhi,
	Kernel development list, Roland McGrath

On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote:
> I'll be happy to move this over to the utrace setting, once it is merged.  
> Do you think it would be better to include the current version of kwatch 
> now or to wait for utrace?
> 
> Roland, is there a schedule for when you plan to get utrace into -mm?

Even if it goes into mainline soon we'll need a lot of time for all
architectures to catch up, so I think kwatch should definitely comes first.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-16 23:35 ` Christoph Hellwig
@ 2007-01-17 16:33   ` Alan Stern
  0 siblings, 0 replies; 70+ messages in thread
From: Alan Stern @ 2007-01-17 16:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Prasanna S Panchamukhi, Kernel development list

On Tue, 16 Jan 2007, Christoph Hellwig wrote:

> Fir4st I'd say thanks a lot for forward-porting this, it's really useful
> feature for all kinds of nasty debugging.
> 
> I think you should split this into two patches, one for the debugreg
> infrastructure, and one for the actual kwatch code.
> 
> Also I think you provide one (or even a few) example wathes for
> trivial things, say updating i_ino for an inode given through debugfs.
> 
> Some comments on the code below:

Many thanks for your detailed comments and suggestions.  It probably was
obvious that most of the things you picked up on were inherited from the
original Kwatch patch.  I'll update my patch in accordance with your
suggestions.

Responses to just a couple of the comments:

> I suspect this should be replaced wit ha global and local variant
> to fix the above mentioned issue.  It's a tiny bit duplicated code,
> but seems much cleaner.

It would indeed be cleaner.  And in fact the local variant would have a
large amount of dead code, which could be left out entirely (at least from
the initial version).  That's because the only current user of local debug 
register allocations is ptrace.

> > +static void write_dr(int debugreg, unsigned long addr)
> > +{
> > +	switch (debugreg) {
> > +		case 0:	set_debugreg(addr, 0);	break;
> > +		case 1:	set_debugreg(addr, 1);	break;
> > +		case 2:	set_debugreg(addr, 2);	break;
> > +		case 3:	set_debugreg(addr, 3);	break;
> > +		case 6:	set_debugreg(addr, 6);	break;
> > +		case 7:	set_debugreg(addr, 7);	break;
> > +	}
> > +}
> 
> What's the point of this wrapper?

It is called from two different places, and it's better than including
the "switch" in each place.

> I think large parts of this header should go into a new linux/kwatch.h
> so that generic code can use kwatches.

In the long run that may well be true.  For now, I'm a little hesitant to
put something which works only on i386 under include/linux.

> > +config KWATCH
> > +	bool "Kwatch points (EXPERIMENTAL)"
> > +	depends on EXPERIMENTAL
> > +	help
> > +	  Kwatch enables kernel-space data watchpoints using the processor's
> > +	  debug registers.  It can be very useful for kernel debugging.
> > +	  If in doubt, say "N".
> 
> I think we want different options for debugregs and kwatch.  The debugreg
> one probably doesn't have to be actually user-visible, though.

It's easier to start out like this and then change it later when someone
comes up with another use for debugregs.  Or perhaps by then the whole
thing will have been moved over to utrace, making the issue academic.

Alan Stern

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-17  9:44 ` Ingo Molnar
@ 2007-01-17 16:17   ` Alan Stern
  2007-01-18  0:12     ` Christoph Hellwig
  2007-02-06  4:25     ` Roland McGrath
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-01-17 16:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Prasanna S Panchamukhi, Kernel development list,
	Roland McGrath

On Wed, 17 Jan 2007, Ingo Molnar wrote:

> * Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > From: Alan Stern <stern@rowland.harvard.edu>
> > 
> > This patch (as839) implements the Kwatch (kernel-space hardware-based 
> > watchpoints) API for the i386 architecture.  The API is explained in 
> > the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.
> 
> i think it would be nice to have this ontop of Roland's utrace 
> infrastructure, which nicely modularizes all hardware debugging 
> capabilities and detaches it from ptrace.

I'll be happy to move this over to the utrace setting, once it is merged.  
Do you think it would be better to include the current version of kwatch 
now or to wait for utrace?

Roland, is there a schedule for when you plan to get utrace into -mm?

Alan Stern


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-16 16:55 Alan Stern
  2007-01-16 23:35 ` Christoph Hellwig
@ 2007-01-17  9:44 ` Ingo Molnar
  2007-01-17 16:17   ` Alan Stern
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-01-17  9:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andrew Morton, Prasanna S Panchamukhi, Kernel development list,
	Roland McGrath


* Alan Stern <stern@rowland.harvard.edu> wrote:

> From: Alan Stern <stern@rowland.harvard.edu>
> 
> This patch (as839) implements the Kwatch (kernel-space hardware-based 
> watchpoints) API for the i386 architecture.  The API is explained in 
> the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

i think it would be nice to have this ontop of Roland's utrace 
infrastructure, which nicely modularizes all hardware debugging 
capabilities and detaches it from ptrace.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
  2007-01-16 16:55 Alan Stern
@ 2007-01-16 23:35 ` Christoph Hellwig
  2007-01-17 16:33   ` Alan Stern
  2007-01-17  9:44 ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Christoph Hellwig @ 2007-01-16 23:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: Andrew Morton, Prasanna S Panchamukhi, Kernel development list

Fir4st I'd say thanks a lot for forward-porting this, it's really useful
feature for all kinds of nasty debugging.

I think you should split this into two patches, one for the debugreg
infrastructure, and one for the actual kwatch code.

Also I think you provide one (or even a few) example wathes for
trivial things, say updating i_ino for an inode given through debugfs.

Some comments on the code below:

> --- /dev/null
> +++ usb-2.6/arch/i386/kernel/debugreg.c
> @@ -0,0 +1,182 @@
> +/*
> + *  Debug register
> + *  arch/i386/kernel/debugreg.c

Please don't put in comments that mention the name of the containing
file.  Also the "Debug register" comments seems rather useless.

> + * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> and
> + *		Bharata Rao <bharata@in.ibm.com> to provide debug register
> + *		allocation mechanism.
> + * 2004-Oct	Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> with
> + *		idr_allocations mechanism as suggested by Andi Kleen.

I think these kinds of comments aren't in fashion anymore either, all
changelogs should be in git commit messages and initial credits go
into the first commit message.

> +struct debugreg dr_list[DR_MAX];
> +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;

I think you're supposed to use magic DEFINE_SPINLOCK macro these days.

> +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
> +		DR_GLOBAL_ENABLE_MASK;

I'd rahter keep this static and make  set_process_dr7 a non-inline
function.

> +
> +static unsigned long dr7_global_reg_mask(unsigned int regnum)
> +{
> +	return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2));
> +}
> +
> +static int get_dr(int regnum, int flag)
> +{
> +	if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
> +		dr_list[regnum].flag = flag;
> +		dr7_global_mask |= dr7_global_reg_mask(regnum);
> +		return regnum;
> +	}
> +	if (flag == DR_ALLOC_LOCAL &&
> +			dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
> +		dr_list[regnum].flag = flag;
> +		dr_list[regnum].use_count++;
> +		return regnum;
> +	}
> +	return -1;

This looks rather poorly structured, as the function does compltely
different things depending on the flags passed in.

> +static void free_dr(int regnum)
> +{
> +	if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
> +		if (!--dr_list[regnum].use_count)
> +			dr_list[regnum].flag = 0;
> +	} else {
> +		dr_list[regnum].flag = 0;
> +		dr_list[regnum].use_count = 0;
> +		dr7_global_mask &= ~(dr7_global_reg_mask(regnum));
> +	}
> +}

Same here.

> +int dr_alloc(int regnum, int flag)
> +{
> +	int ret = -1;
> +
> +	spin_lock(&dr_lock);
> +	if (regnum >= 0 && regnum < DR_MAX)
> +		ret = get_dr(regnum, flag);
> +	else if (regnum == DR_ANY) {
> +
> +		/* gdb allocates local debug registers starting from 0.
> +		 * To help avoid conflicts, we'll start from the other end.
> +		 */
> +		for (regnum = DR_MAX - 1; regnum >= 0; --regnum) {
> +			ret = get_dr(regnum, flag);
> +			if (ret >= 0)
> +				break;
> +		}
> +	} else
> +		printk(KERN_ERR "dr_alloc: "
> +				"Cannot allocate debug register %d\n", regnum);
> +	spin_unlock(&dr_lock);
> +	return ret;

I suspect this should be replaced wit ha global and local variant
to fix the above mentioned issue.  It's a tiny bit duplicated code,
but seems much cleaner.

> +static int get_dr(int regnum, int flag)
> +{
> +	if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
> +		dr_list[regnum].flag = flag;
> +		dr7_global_mask |= dr7_global_reg_mask(regnum);
> +		return regnum;
> +	}
> +	if (flag == DR_ALLOC_LOCAL &&
> +			dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
> +		dr_list[regnum].flag = flag;
> +		dr_list[regnum].use_count++;
> +		return regnum;
> +	}
> +	return -1;

Same comments about global vs local here.

> +
> +EXPORT_SYMBOL(dr_alloc);
> +EXPORT_SYMBOL(dr_free);

I don't think we want these exported at all, and if a proper modular
user shows up they should be _GPL as they're fairly lowlevel.

Btw, the naming in the whole debugregs code should be consolidated to
be debugreg_ instead of all kinds of different variants.

> +#ifdef CONFIG_KWATCH
> +
> +/* Set the type, len and global flag in dr7 for a debug register */
> +#define SET_DR7(dr, regnum, type, len)	do {		\
> +		dr &= ~(0xf << (16 + (regnum) * 4));	\
> +		dr |= (((((len) - 1) << 2) | (type)) <<	\
> +				(16 + (regnum) * 4)) |	\
> +			(0x2 << ((regnum) * 2));	\
> +	} while (0)
> +
> +/* Disable a debug register by clearing the global/local flag in dr7 */
> +#define RESET_DR7(dr, regnum)	dr &= ~(0x3 << ((regnum) * 2))

I don't think there's any point in making these macros conditional.
Then again is there a good reason to mke these macros?

> + *  Kernel Watchpoint interface.
> + *  arch/i386/kernel/kwatch.c
> + *
> + *
> + * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> for
> + *		Kernel Watchpoint implementation.
> + * 2004-Oct	Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> to
> + *		to make use of notifiers.
> + */

Same comments about these comments applies as in debugreg.c

> +#include <linux/kprobes.h>
> +#include <linux/ptrace.h>
> +#include <linux/spinlock.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <asm/kwatch.h>
> +#include <asm/kdebug.h>
> +#include <asm/debugreg.h>
> +#include <asm/bitops.h>

I think this should be linux/bitops.h these days.

> +
> +#define RF_MASK	0x00010000
> +
> +static struct kwatch kwatch_list[DR_MAX];
> +static spinlock_t kwatch_lock = SPIN_LOCK_UNLOCKED;



> +static unsigned long kwatch_in_progress;	/* currently being handled */

Give that this is a bitmap the comment is rather misleading, it should
probably be:

/*
 * Bitmap of registers beeing handled.
 */

> +static void write_dr(int debugreg, unsigned long addr)
> +{
> +	switch (debugreg) {
> +		case 0:	set_debugreg(addr, 0);	break;
> +		case 1:	set_debugreg(addr, 1);	break;
> +		case 2:	set_debugreg(addr, 2);	break;
> +		case 3:	set_debugreg(addr, 3);	break;
> +		case 6:	set_debugreg(addr, 6);	break;
> +		case 7:	set_debugreg(addr, 7);	break;
> +	}
> +}

What's the point of this wrapper?

> +
> +#define write_dr7(val)	set_debugreg((val), 7)
> +#define read_dr7(val)	get_debugreg((val), 7)

And these?

> +	if (kwatch_in_progress)
> +		goto recursed;
> +

I don't think there's any point in this goto, just handle it inside
the if block

> +	set_bit(debugreg, &kwatch_in_progress);
> +
> +	spin_lock(&kwatch_lock);
> +	if ((unsigned long) kwatch_list[debugreg].addr != addr)
> +		goto out;
> +
> +	if (kwatch_list[debugreg].handler)
> +		kwatch_list[debugreg].handler(&kwatch_list[debugreg], regs);
> +
> +	if (kwatch_list[debugreg].type == DR_TYPE_EXECUTE)
> +		regs->eflags |= RF_MASK;
> +      out:

Again, I think the goto here could be avoided and actually make the code
cleanere.  Also a local variable for kwatch_list[debugreg] with a short
would probably make this section of code a lot more readable.

> +
> +static int __init init_kwatch(void)
> +{
> +	int err = 0;
> +
> +	err = register_die_notifier(&kwatch_exceptions_nb);
> +	return err;
> +}

Just remove the err local variable here.

> +EXPORT_SYMBOL_GPL(register_kwatch);
> +EXPORT_SYMBOL_GPL(unregister_kwatch);

Please move these exports close to the actual function definition.

> --- /dev/null
> +++ usb-2.6/include/asm-i386/kwatch.h
> @@ -0,0 +1,60 @@
> +#ifndef _ASM_KWATCH_H
> +#define _ASM_KWATCH_H
> +/*
> + *  Kernel Watchpoint interface.
> + *  include/asm-i386/kwatch.h

> + * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> for
> + *		Kernel Watchpoint implementation.
> + */

Same comments once again.

> +#include <linux/types.h>
> +#include <linux/ptrace.h>
> +
> +struct kwatch;
> +typedef void (*kwatch_handler_t) (struct kwatch *, struct pt_regs *);
> +
> +struct kwatch {
> +	void *addr;		/* location of watchpoint */
> +	u8 length;		/* range of address */
> +	u8 type;		/* type of watchpoint */
> +	kwatch_handler_t handler;
> +};
> +
> +#define DR_TYPE_EXECUTE 	0x0	/* Watchpoint types */
> +#define DR_TYPE_WRITE		0x1
> +#define DR_TYPE_IO		0x2
> +#define DR_TYPE_RW		0x3

I think large parts of this header should go into a new linux/kwatch.h
so that generic code can use kwatches.

> +config KWATCH
> +	bool "Kwatch points (EXPERIMENTAL)"
> +	depends on EXPERIMENTAL
> +	help
> +	  Kwatch enables kernel-space data watchpoints using the processor's
> +	  debug registers.  It can be very useful for kernel debugging.
> +	  If in doubt, say "N".

I think we want different options for debugregs and kwatch.  The debugreg
one probably doesn't have to be actually user-visible, though.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH] Kwatch: kernel watchpoints using CPU debug registers
@ 2007-01-16 16:55 Alan Stern
  2007-01-16 23:35 ` Christoph Hellwig
  2007-01-17  9:44 ` Ingo Molnar
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Stern @ 2007-01-16 16:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Prasanna S Panchamukhi, Kernel development list

From: Alan Stern <stern@rowland.harvard.edu>

This patch (as839) implements the Kwatch (kernel-space hardware-based
watchpoints) API for the i386 architecture.  The API is explained in
the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

The original version of the patch was written by Vamsi Krishna S and
Bharata Rao.  It was later updated by Prasanna S Panchamukhi for 2.6.13
and then again by me for 2.6.20.

Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Hardware-based watchpoints can sometimes be indispensable for finding the 
source of problems.  Although this patch is only for the x86 architecture, 
it should still be useful.  And there's no downside to adopting it, since 
it has virtually no overhead with CONFIG_KWATCH isn't selected.


Index: usb-2.6/arch/i386/kernel/debugreg.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/debugreg.c
@@ -0,0 +1,182 @@
+/*
+ *  Debug register
+ *  arch/i386/kernel/debugreg.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> and
+ *		Bharata Rao <bharata@in.ibm.com> to provide debug register
+ *		allocation mechanism.
+ * 2004-Oct	Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> with
+ *		idr_allocations mechanism as suggested by Andi Kleen.
+ */
+
+/*
+ * These routines provide a debug register allocation mechanism.
+ */
+
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <asm/system.h>
+#include <asm/debugreg.h>
+
+struct debugreg {
+	int flag;
+	int use_count;
+};
+
+struct debugreg dr_list[DR_MAX];
+static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;
+unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
+		DR_GLOBAL_ENABLE_MASK;
+
+static unsigned long dr7_global_reg_mask(unsigned int regnum)
+{
+	return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2));
+}
+
+static int get_dr(int regnum, int flag)
+{
+	if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
+		dr_list[regnum].flag = flag;
+		dr7_global_mask |= dr7_global_reg_mask(regnum);
+		return regnum;
+	}
+	if (flag == DR_ALLOC_LOCAL &&
+			dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
+		dr_list[regnum].flag = flag;
+		dr_list[regnum].use_count++;
+		return regnum;
+	}
+	return -1;
+}
+
+static void free_dr(int regnum)
+{
+	if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
+		if (!--dr_list[regnum].use_count)
+			dr_list[regnum].flag = 0;
+	} else {
+		dr_list[regnum].flag = 0;
+		dr_list[regnum].use_count = 0;
+		dr7_global_mask &= ~(dr7_global_reg_mask(regnum));
+	}
+}
+
+int dr_alloc(int regnum, int flag)
+{
+	int ret = -1;
+
+	spin_lock(&dr_lock);
+	if (regnum >= 0 && regnum < DR_MAX)
+		ret = get_dr(regnum, flag);
+	else if (regnum == DR_ANY) {
+
+		/* gdb allocates local debug registers starting from 0.
+		 * To help avoid conflicts, we'll start from the other end.
+		 */
+		for (regnum = DR_MAX - 1; regnum >= 0; --regnum) {
+			ret = get_dr(regnum, flag);
+			if (ret >= 0)
+				break;
+		}
+	} else
+		printk(KERN_ERR "dr_alloc: "
+				"Cannot allocate debug register %d\n", regnum);
+	spin_unlock(&dr_lock);
+	return ret;
+}
+
+void dr_free(int regnum)
+{
+	spin_lock(&dr_lock);
+	if (regnum < 0 || regnum >= DR_MAX || !dr_list[regnum].flag)
+		printk(KERN_ERR "dr_free: "
+				"Cannot free debug register %d\n", regnum);
+	else
+		free_dr(regnum);
+	spin_unlock(&dr_lock);
+}
+
+void dr_inc_use_count(unsigned long mask)
+{
+	int i;
+	int dr_local_enable = 1 << DR_LOCAL_ENABLE_SHIFT;
+
+	spin_lock(&dr_lock);
+	for (i = 0; i < DR_MAX; (++i, dr_local_enable <<= DR_ENABLE_SIZE)) {
+		if (mask & dr_local_enable)
+			dr_list[i].use_count++;
+	}
+	spin_unlock(&dr_lock);
+}
+
+void dr_dec_use_count(unsigned long mask)
+{
+	int i;
+	int dr_local_enable = 1 << DR_LOCAL_ENABLE_SHIFT;
+
+	spin_lock(&dr_lock);
+	for (i = 0; i < DR_MAX; (++i, dr_local_enable <<= DR_ENABLE_SIZE)) {
+		if (mask & dr_local_enable)
+			free_dr(i);
+	}
+	spin_unlock(&dr_lock);
+}
+
+int dr_is_global(int regnum)
+{
+	return (dr_list[regnum].flag == DR_ALLOC_GLOBAL);
+}
+
+/*
+ * This routine decides if a ptrace request is for enabling or disabling
+ * a debug reg, and accordingly calls dr_alloc() or dr_free().
+ *
+ * gdb uses ptrace to write to debug registers.  It assumes that writing to
+ * a debug register always succeds and it doesn't check the return value of
+ * ptrace.  Now with this new global debug register allocation/freeing,
+ * ptrace request for a local debug register will fail if the required debug
+ * register is already globally allocated.  Since gdb doesn't notice this
+ * failure, it sometimes tries to free a debug register which it does not
+ * own.
+ *
+ * Returns -1 if the ptrace request tries to locally allocate a debug register
+ * that is already globally allocated.  Otherwise returns >0 or 0 according
+ * as any debug registers are or are not locally allocated in the new setting.
+ */
+int enable_debugreg(unsigned long old_dr7, unsigned long new_dr7)
+{
+	int i;
+	int dr_local_enable = 1 << DR_LOCAL_ENABLE_SHIFT;
+
+	if (new_dr7 & DR_LOCAL_ENABLE_MASK & dr7_global_mask)
+		return -1;
+	for (i = 0; i < DR_MAX; (++i, dr_local_enable <<= DR_ENABLE_SIZE)) {
+		if ((old_dr7 ^ new_dr7) & dr_local_enable) {
+			if (new_dr7 & dr_local_enable)
+				dr_alloc(i, DR_ALLOC_LOCAL);
+			else
+				dr_free(i);
+		}
+	}
+	return new_dr7 & DR_LOCAL_ENABLE_MASK;
+}
+
+EXPORT_SYMBOL(dr_alloc);
+EXPORT_SYMBOL(dr_free);
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -51,6 +51,7 @@
 #ifdef CONFIG_MATH_EMULATION
 #include <asm/math_emu.h>
 #endif
+#include <asm/debugreg.h>
 
 #include <linux/err.h>
 
@@ -356,9 +357,10 @@ EXPORT_SYMBOL(kernel_thread);
  */
 void exit_thread(void)
 {
+	struct task_struct *tsk = current;
+
 	/* The process may have allocated an io port bitmap... nuke it. */
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
 		struct thread_struct *t = &tsk->thread;
 		int cpu = get_cpu();
 		struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -376,12 +378,16 @@ void exit_thread(void)
 		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
+ 	if (unlikely(tsk->thread.debugreg[7]))
+ 		dr_dec_use_count(tsk->thread.debugreg[7]);
 }
 
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
 
+	if (unlikely(tsk->thread.debugreg[7]))
+		dr_dec_use_count(tsk->thread.debugreg[7]);
 	memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));	
 	clear_tsk_thread_flag(tsk, TIF_DEBUG);
@@ -462,6 +468,9 @@ int copy_thread(int nr, unsigned long cl
 		desc->b = LDT_entry_b(&info);
 	}
 
+	if (unlikely(tsk->thread.debugreg[7]))
+		dr_inc_use_count(tsk->thread.debugreg[7]);
+
 	err = 0;
  out:
 	if (err && p->thread.io_bitmap_ptr) {
@@ -537,14 +546,22 @@ static noinline void __switch_to_xtra(st
 
 	next = &next_p->thread;
 
+	/*
+	 * Don't reload global debug registers. Don't touch the global debug
+	 * register settings in dr7.
+	 */
 	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg[0], 0);
-		set_debugreg(next->debugreg[1], 1);
-		set_debugreg(next->debugreg[2], 2);
-		set_debugreg(next->debugreg[3], 3);
+		if (!dr_is_global(0))
+			set_debugreg(next->debugreg[0], 0);
+		if (!dr_is_global(1))
+			set_debugreg(next->debugreg[1], 1);
+		if (!dr_is_global(2))
+			set_debugreg(next->debugreg[2], 2);
+		if (!dr_is_global(3))
+			set_debugreg(next->debugreg[3], 3);
 		/* no 4 and 5 */
 		set_debugreg(next->debugreg[6], 6);
-		set_debugreg(next->debugreg[7], 7);
+		set_process_dr7(next->debugreg[7]);
 	}
 
 	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -412,6 +412,7 @@ long arch_ptrace(struct task_struct *chi
 			ret = putreg(child, addr, data);
 			break;
 		}
+
 		/* We need to be very careful here.  We implicitly
 		   want to modify a portion of the task_struct, and we
 		   have to be selective about what portions we allow someone
@@ -421,10 +422,18 @@ long arch_ptrace(struct task_struct *chi
 		if (addr >= (long) &dummy->u_debugreg[0] &&
 		    addr <= (long) &dummy->u_debugreg[7]) {
 
-			if (addr == (long) &dummy->u_debugreg[4]) break;
-			if (addr == (long) &dummy->u_debugreg[5]) break;
-			if (addr < (long) &dummy->u_debugreg[4] &&
-			    ((unsigned long) data) >= TASK_SIZE-3) break;
+			addr -= (long) &dummy->u_debugreg;
+			addr = addr >> 2;
+			if (addr < 4) {
+				if ((unsigned long) data >= TASK_SIZE-3)
+					break;
+				if (dr_is_global(addr)) {
+					ret = -EBUSY;
+					break;
+				}
+			}
+			else if (addr == 4 || addr == 5)
+				break;
 
 			/* Sanity-check data. Take one half-byte at once with
 			 * check = (val >> (16 + 4*i)) & 0xf. It contains the
@@ -456,18 +465,21 @@ long arch_ptrace(struct task_struct *chi
 			 * See the AMD manual no. 24593 (AMD64 System
 			 * Programming) */
 
-			if (addr == (long) &dummy->u_debugreg[7]) {
+			else if (addr == 7) {
 				data &= ~DR_CONTROL_RESERVED;
 				for (i = 0; i < 4; i++)
 					if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
 						goto out_tsk;
-				if (data)
+				i = enable_debugreg(child->thread.debugreg[7], data);
+				if (i < 0) {
+					ret = -EBUSY;
+					break;
+				}
+				if (i)
 					set_tsk_thread_flag(child, TIF_DEBUG);
 				else
 					clear_tsk_thread_flag(child, TIF_DEBUG);
 			}
-			addr -= (long) &dummy->u_debugreg;
-			addr = addr >> 2;
 			child->thread.debugreg[addr] = data;
 			ret = 0;
 		}
Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -25,6 +25,7 @@
 #include <asm/ucontext.h>
 #include <asm/uaccess.h>
 #include <asm/i387.h>
+#include <asm/debugreg.h>
 #include "sigframe.h"
 
 #define DEBUG_SIG 0
@@ -594,7 +595,7 @@ static void fastcall do_signal(struct pt
 		 * inside the kernel.
 		 */
 		if (unlikely(current->thread.debugreg[7]))
-			set_debugreg(current->thread.debugreg[7], 7);
+			set_process_dr7(current->thread.debugreg[7]);
 
 		/* Whee!  Actually deliver the signal.  */
 		if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -808,6 +808,7 @@ fastcall void __kprobes do_debug(struct 
 	struct task_struct *tsk = current;
 
 	get_debugreg(condition, 6);
+	set_debugreg(0, 6);	/* DR6 is never cleared by the CPU */
 
 	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
 					SIGTRAP) == NOTIFY_STOP)
@@ -849,7 +850,7 @@ fastcall void __kprobes do_debug(struct 
 	 * the signal is delivered.
 	 */
 clear_dr7:
-	set_debugreg(0, 7);
+	set_process_dr7(0);
 	return;
 
 debug_vm86:
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -33,6 +33,7 @@
 
 #define DR_RW_EXECUTE (0x0)   /* Settings for the access types to trap on */
 #define DR_RW_WRITE (0x1)
+#define DR_RW_IO (0x2)
 #define DR_RW_READ (0x3)
 
 #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
@@ -61,4 +62,63 @@
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
+#define DR_MAX	4
+#define DR_ANY	(DR_MAX + 1)
+
+/* global or local allocation requests */
+#define DR_ALLOC_GLOBAL		1
+#define DR_ALLOC_LOCAL		2
+
+#ifdef CONFIG_KWATCH
+
+/* Set the type, len and global flag in dr7 for a debug register */
+#define SET_DR7(dr, regnum, type, len)	do {		\
+		dr &= ~(0xf << (16 + (regnum) * 4));	\
+		dr |= (((((len) - 1) << 2) | (type)) <<	\
+				(16 + (regnum) * 4)) |	\
+			(0x2 << ((regnum) * 2));	\
+	} while (0)
+
+/* Disable a debug register by clearing the global/local flag in dr7 */
+#define RESET_DR7(dr, regnum)	dr &= ~(0x3 << ((regnum) * 2))
+
+extern int dr_alloc(int regnum, int flag);
+extern void dr_free(int regnum);
+extern void dr_inc_use_count(unsigned long mask);
+extern void dr_dec_use_count(unsigned long mask);
+extern int dr_is_global(int regnum);
+extern unsigned long dr7_global_mask;
+extern int enable_debugreg(unsigned long old_dr7, unsigned long new_dr7);
+
+static inline void set_process_dr7(unsigned long new_dr7)
+{
+	unsigned long dr7;
+
+	get_debugreg(dr7, 7);
+	dr7 = (dr7 & dr7_global_mask) | (new_dr7 & ~dr7_global_mask);
+	set_debugreg(dr7, 7);
+}
+
+#else
+
+static inline void dr_inc_use_count(unsigned long mask)
+{
+}
+static inline void dr_dec_use_count(unsigned long mask)
+{
+}
+static inline int dr_is_global(int regnum)
+{
+	return 0;
+}
+static inline int enable_debugreg(unsigned long old_dr7, unsigned long new_dr7)
+{
+	return (new_dr7 != 0);
+}
+static inline void set_process_dr7(unsigned long new_dr7)
+{
+	set_debugreg(new_dr7, 7);
+}
+
+#endif				/* CONFIG_KWATCH */
 #endif
Index: usb-2.6/arch/i386/kernel/kwatch.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/kwatch.c
@@ -0,0 +1,281 @@
+/*
+ *  Kernel Watchpoint interface.
+ *  arch/i386/kernel/kwatch.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> for
+ *		Kernel Watchpoint implementation.
+ * 2004-Oct	Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> to
+ *		to make use of notifiers.
+ */
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <asm/kwatch.h>
+#include <asm/kdebug.h>
+#include <asm/debugreg.h>
+#include <asm/bitops.h>
+
+#define RF_MASK	0x00010000
+
+static struct kwatch kwatch_list[DR_MAX];
+static spinlock_t kwatch_lock = SPIN_LOCK_UNLOCKED;
+static unsigned long kwatch_in_progress;	/* currently being handled */
+
+struct dr_info {
+	int debugreg;
+	unsigned long addr;
+	int type;
+};
+
+static void write_dr(int debugreg, unsigned long addr)
+{
+	switch (debugreg) {
+		case 0:	set_debugreg(addr, 0);	break;
+		case 1:	set_debugreg(addr, 1);	break;
+		case 2:	set_debugreg(addr, 2);	break;
+		case 3:	set_debugreg(addr, 3);	break;
+		case 6:	set_debugreg(addr, 6);	break;
+		case 7:	set_debugreg(addr, 7);	break;
+	}
+}
+
+#define write_dr7(val)	set_debugreg((val), 7)
+#define read_dr7(val)	get_debugreg((val), 7)
+
+static void write_smp_dr(void *info)
+{
+	struct dr_info *dr = (struct dr_info *)info;
+
+	if (cpu_has_de && dr->type == DR_TYPE_IO)
+		set_in_cr4(X86_CR4_DE);
+	write_dr(dr->debugreg, dr->addr);
+}
+
+/* Update the debug register on all CPUs */
+static void sync_dr(int debugreg, unsigned long addr, int type)
+{
+	struct dr_info dr;
+	dr.debugreg = debugreg;
+	dr.addr = addr;
+	dr.type = type;
+	smp_call_function(write_smp_dr, &dr, 0, 0);
+}
+
+/*
+ * Interrupts are disabled on entry as trap1 is an interrupt gate and they
+ * remain disabled thorough out this function.
+ */
+int kwatch_handler(unsigned long condition, struct pt_regs *regs)
+{
+	unsigned int debugreg;
+	unsigned long addr;
+
+	/* Using the debug status register value, find the debug register
+	 * number and the address for which the trap occurred. */
+	if (condition & DR_TRAP0) {
+		debugreg = 0;
+		get_debugreg(addr, 0);
+	} else if (condition & DR_TRAP1) {
+		debugreg = 1;
+		get_debugreg(addr, 1);
+	} else if (condition & DR_TRAP2) {
+		debugreg = 2;
+		get_debugreg(addr, 2);
+	} else if (condition & DR_TRAP3) {
+		debugreg = 3;
+		get_debugreg(addr, 3);
+	} else
+		return 0;
+
+	/* We're in an interrupt, but this is clear and BUG()-safe. */
+	preempt_disable();
+
+	/* If we are recursing, we already hold the lock. */
+	if (kwatch_in_progress)
+		goto recursed;
+
+	set_bit(debugreg, &kwatch_in_progress);
+
+	spin_lock(&kwatch_lock);
+	if ((unsigned long) kwatch_list[debugreg].addr != addr)
+		goto out;
+
+	if (kwatch_list[debugreg].handler)
+		kwatch_list[debugreg].handler(&kwatch_list[debugreg], regs);
+
+	if (kwatch_list[debugreg].type == DR_TYPE_EXECUTE)
+		regs->eflags |= RF_MASK;
+      out:
+	clear_bit(debugreg, &kwatch_in_progress);
+	spin_unlock(&kwatch_lock);
+	preempt_enable_no_resched();
+	return 0;
+
+      recursed:
+	if (kwatch_list[debugreg].type == DR_TYPE_EXECUTE)
+		regs->eflags |= RF_MASK;
+	preempt_enable_no_resched();
+	return 1;
+}
+
+/**
+ * register_kwatch - register a hardware watchpoint
+ * @addr: address of the watchpoint
+ * @length: extent of the watchpoint (1, 2, or 4 bytes)
+ * @type: type of access to trap (read, write, I/O, or execute)
+ * @handler: callback routine to invoke when a trap occurs
+ *
+ * Allocates and returns a debug register and installs the requested
+ * watchpoint.
+ *
+ * @length must be 1, 2, or 4, and @type must be one of %DR_TYPE_RW
+ * (read or write), %DR_TYPE_WRITE (write only), %DR_TYPE_IO (I/O space
+ * access), or %DR_TYPE_EXECUTE.  Note that %DR_TYPE_IO is available only
+ * on processors with Debugging Extensions, and @length must be 1 for
+ * %DR_TYPE_EXECUTE.
+ *
+ * When a trap occurs, @handler is invoked in_interrupt with a pointer
+ * to a struct kwatch containing the watchpoint information and a pointer
+ * to the CPU register values at the time of the trap.  %DR_TYPE_EXECUTE
+ * traps occur before the watch-pointed instruction executes; all other
+ * types occur after the memory or I/O access has taken place.
+ *
+ * Returns a debug register number or a negative error code.
+ */
+int register_kwatch(void *addr, u8 length, u8 type, kwatch_handler_t handler)
+{
+	int debugreg;
+	unsigned long dr7, flags;
+
+	switch (length) {
+	case 1:
+	case 2:
+	case 4:
+		break;
+	default:
+		return -EINVAL;
+	}
+	switch (type) {
+	case DR_TYPE_WRITE:
+	case DR_TYPE_RW:
+		break;
+	case DR_TYPE_IO:
+		if (cpu_has_de)
+			break;
+		return -EINVAL;
+	case DR_TYPE_EXECUTE:
+		if (length == 1)
+			break;
+		/* FALL THROUGH */
+	default:
+		return -EINVAL;
+	}
+	if (!handler)
+		return -EINVAL;
+
+	debugreg = dr_alloc(DR_ANY, DR_ALLOC_GLOBAL);
+	if (debugreg < 0)
+		return -EBUSY;
+
+	spin_lock_irqsave(&kwatch_lock, flags);
+	kwatch_list[debugreg].addr = addr;
+	kwatch_list[debugreg].length = length;
+	kwatch_list[debugreg].type = type;
+	kwatch_list[debugreg].handler = handler;
+	spin_unlock_irqrestore(&kwatch_lock, flags);
+
+	if (type == DR_TYPE_IO)
+		set_in_cr4(X86_CR4_DE);
+	write_dr(debugreg, (unsigned long) addr);
+	sync_dr(debugreg, (unsigned long) addr, type);
+
+	read_dr7(dr7);
+	SET_DR7(dr7, debugreg, type, length);
+	write_dr7(dr7);
+	sync_dr(7, dr7, 0);
+	return debugreg;
+}
+
+/**
+ * unregister_kwatch - free a previously-allocated debugging watchpoint
+ * @debugreg: the debugging register to deallocate
+ *
+ * Removes a hardware watchpoint and deallocates the corresponding
+ * debugging register.  @debugreg must previously have been allocated
+ * by register_kwatch().
+ */
+void unregister_kwatch(int debugreg)
+{
+	unsigned long flags;
+	unsigned long dr7;
+
+	if (debugreg < 0 || debugreg >= DR_MAX ||
+			!kwatch_list[debugreg].handler)
+		return;
+
+	read_dr7(dr7);
+	RESET_DR7(dr7, debugreg);
+	write_dr7(dr7);
+	sync_dr(7, dr7, 0);
+
+	spin_lock_irqsave(&kwatch_lock, flags);
+	kwatch_list[debugreg].addr = 0;
+	kwatch_list[debugreg].handler = NULL;
+	spin_unlock_irqrestore(&kwatch_lock, flags);
+
+	dr_free(debugreg);
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int kwatch_exceptions_notify(struct notifier_block *self, unsigned long val,
+			     void *data)
+{
+	struct die_args *args = (struct die_args *)data;
+	switch (val) {
+	case DIE_DEBUG:
+		if (kwatch_handler(args->err, args->regs))
+			return NOTIFY_STOP;
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kwatch_exceptions_nb = {
+	.notifier_call = kwatch_exceptions_notify,
+	.priority = 0x7ffffffe	/* we need to notified second */
+};
+
+static int __init init_kwatch(void)
+{
+	int err = 0;
+
+	err = register_die_notifier(&kwatch_exceptions_nb);
+	return err;
+}
+
+__initcall(init_kwatch);
+
+EXPORT_SYMBOL_GPL(register_kwatch);
+EXPORT_SYMBOL_GPL(unregister_kwatch);
Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_VM86)		+= vm86.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
+obj-$(CONFIG_KWATCH)		+= debugreg.o kwatch.o
 
 # Make sure this is linked after any other paravirt_ops structs: see head.S
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o
Index: usb-2.6/include/asm-i386/kwatch.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/kwatch.h
@@ -0,0 +1,60 @@
+#ifndef _ASM_KWATCH_H
+#define _ASM_KWATCH_H
+/*
+ *  Kernel Watchpoint interface.
+ *  include/asm-i386/kwatch.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct	Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> for
+ *		Kernel Watchpoint implementation.
+ */
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+struct kwatch;
+typedef void (*kwatch_handler_t) (struct kwatch *, struct pt_regs *);
+
+struct kwatch {
+	void *addr;		/* location of watchpoint */
+	u8 length;		/* range of address */
+	u8 type;		/* type of watchpoint */
+	kwatch_handler_t handler;
+};
+
+#define DR_TYPE_EXECUTE 	0x0	/* Watchpoint types */
+#define DR_TYPE_WRITE		0x1
+#define DR_TYPE_IO		0x2
+#define DR_TYPE_RW		0x3
+
+#ifdef CONFIG_KWATCH
+extern int register_kwatch(void *addr, u8 length, u8 type,
+		kwatch_handler_t handler);
+extern void unregister_kwatch(int debugreg);
+
+#else
+
+static inline int register_kwatch(void *addr, u8 length, u8 type,
+		kwatch_handler_t handler)
+{
+	return -ENOSYS;
+}
+static inline void unregister_kwatch(int debugreg)
+{
+}
+#endif
+#endif				/* _ASM_KWATCH_H */
Index: usb-2.6/arch/i386/Kconfig
===================================================================
--- usb-2.6.orig/arch/i386/Kconfig
+++ usb-2.6/arch/i386/Kconfig
@@ -1210,6 +1210,14 @@ config KPROBES
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+config KWATCH
+	bool "Kwatch points (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	help
+	  Kwatch enables kernel-space data watchpoints using the processor's
+	  debug registers.  It can be very useful for kernel debugging.
+	  If in doubt, say "N".
 endmenu
 
 source "arch/i386/Kconfig.debug"


^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2007-07-11  6:59 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070207025008.1B11118005D@magilla.sf.frob.com>
2007-02-07 19:22 ` [PATCH] Kwatch: kernel watchpoints using CPU debug registers Alan Stern
2007-02-07 22:08   ` Bob Copeland
2007-02-09 10:21   ` Roland McGrath
2007-02-09 15:54     ` Alan Stern
2007-02-09 23:31       ` Roland McGrath
2007-02-10  4:32         ` Alan Stern
2007-02-18  3:03           ` Roland McGrath
2007-02-21 20:35         ` Alan Stern
2007-02-22 11:43           ` S. P. Prasanna
2007-02-23  2:19           ` Roland McGrath
2007-02-23 16:55             ` Alan Stern
2007-02-24  0:08               ` Roland McGrath
2007-03-02 17:19                 ` [RFC] hwbkpt: Hardware breakpoints (was Kwatch) Alan Stern
2007-03-05  7:01                   ` Roland McGrath
2007-03-05 13:36                     ` Christoph Hellwig
2007-03-05 16:16                       ` Alan Stern
2007-03-05 16:49                         ` Christoph Hellwig
2007-03-05 22:04                         ` Roland McGrath
2007-03-05 17:25                     ` Alan Stern
2007-03-06  3:13                       ` Roland McGrath
2007-03-06 15:23                         ` Alan Stern
2007-03-07  3:49                           ` Roland McGrath
2007-03-07 19:11                             ` Alan Stern
2007-03-09  6:52                               ` Roland McGrath
2007-03-09 18:40                                 ` Alan Stern
2007-03-13  8:00                                   ` Roland McGrath
2007-03-13 13:07                                     ` Alan Cox
2007-03-13 18:56                                     ` Alan Stern
2007-03-14  3:00                                       ` Roland McGrath
2007-03-14 19:11                                         ` Alan Stern
2007-03-28 21:39                                           ` Roland McGrath
2007-03-29 21:35                                             ` Alan Stern
2007-04-13 21:09                                             ` Alan Stern
2007-05-11 15:25                                             ` Alan Stern
2007-05-13 10:39                                               ` Roland McGrath
2007-05-14 15:42                                                 ` Alan Stern
2007-05-14 21:25                                                   ` Roland McGrath
2007-05-16 19:03                                                     ` Alan Stern
2007-05-23  8:47                                                       ` Roland McGrath
2007-06-01 19:39                                                         ` Alan Stern
2007-06-14  6:48                                                           ` Roland McGrath
2007-06-19 20:35                                                             ` Alan Stern
2007-06-25 10:52                                                               ` Roland McGrath
2007-06-25 15:36                                                                 ` Alan Stern
2007-06-26 20:49                                                                   ` Roland McGrath
2007-06-27  3:26                                                                     ` Alan Stern
2007-06-27 21:04                                                                       ` Roland McGrath
2007-06-29  3:00                                                                         ` Alan Stern
2007-07-11  6:59                                                                           ` Roland McGrath
2007-06-28  3:02                                                                       ` Roland McGrath
2007-06-25 11:32                                                               ` Roland McGrath
2007-06-25 15:37                                                                 ` Alan Stern
2007-06-25 20:51                                                                 ` Alan Stern
2007-06-26 18:17                                                                   ` Roland McGrath
2007-06-27  2:43                                                                     ` Alan Stern
2007-05-17 20:39                                                 ` Alan Stern
2007-03-16 21:07                                         ` Alan Stern
2007-03-22 19:44                                         ` Alan Stern
     [not found] <20070206042153.66AB418005D@magilla.sf.frob.com>
2007-02-06 19:58 ` [PATCH] Kwatch: kernel watchpoints using CPU debug registers Alan Stern
2007-02-07  2:56   ` Roland McGrath
2007-01-16 16:55 Alan Stern
2007-01-16 23:35 ` Christoph Hellwig
2007-01-17 16:33   ` Alan Stern
2007-01-17  9:44 ` Ingo Molnar
2007-01-17 16:17   ` Alan Stern
2007-01-18  0:12     ` Christoph Hellwig
2007-01-18  7:31       ` Ingo Molnar
2007-01-18 15:37         ` Alan Stern
2007-01-18 22:33         ` Christoph Hellwig
2007-02-06  4:25     ` Roland McGrath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.