linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch 00/11] Hardware Breakpoint interfaces
@ 2009-03-24 15:24 K.Prasad
  2009-03-25 19:48 ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-03-24 15:24 UTC (permalink / raw)
  To: Ingo Molnar, Linux Kernel Mailing List
  Cc: Alan Stern, Andrew Morton, Benjamin Herrenschmidt,
	Frederic Weisbecker, maneesh, Roland McGrath, Steven Rostedt

Hi Ingo and All,
		Please find the new patchset that addresses the comments to the
patch sent here: http://lkml.org/lkml/2009/3/19/430.

The patches are based on -tip tree commit
7ddd16604eb5fc54f8ac61d13be31e0f2c0a791e.

Changelog
---------
- Changes to load_debug_register(), unregister_kernel_hw_breakpoint() to address
  comments from the community.
- Re-arrangement of code in arch_register_kernel_hw_breakpoint(),
  arch_unregister_kernel_hw_breakpoint() and a new arch_load_debug_registers().
- Additional sanity checks in arch_check_va_in_<kernel/user>space().
- Removal of hbkpt_user_max variable and changes to variable names
- Address coding style issues and other minor fixes.

Kindly accept the patches to be a part of -tip tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-03-24 15:24 [Patch 00/11] Hardware Breakpoint interfaces K.Prasad
@ 2009-03-25 19:48 ` Alan Stern
  2009-03-27 22:06   ` K.Prasad
  2009-03-28  8:46   ` K.Prasad
  0 siblings, 2 replies; 27+ messages in thread
From: Alan Stern @ 2009-03-25 19:48 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

There are some serious issues involving userspace breakpoints and the
legacy ptrace interface.  It all comes down to this: At what point
is a breakpoint registered for a ptrace caller?

Remember, to set up a breakpoint a debugger needs to call ptrace
twice: once to put the address in one of the DR0-DR3 registers and
once to set up DR7.  So when does the task own the breakpoint?

Logically, we should wait until DR7 gets set, because until then the
breakpoint is not active.  But then how do we let the caller know that
one of his breakpoints conflicts with a kernel breakpoint?

If we report an error during an attempt to set DR0-DR3 then at least
it's unambiguous.  But then how do we know when the task is _finished_
using the breakpoint?  It's under no obligation to set the register
back to 0.

Related to this is the question of how to store the task's versions of
DR0-DR3 when there is no associated active breakpoint.  Maybe it would
be best to keep the existing registers in the thread structure.


> +++ linux-2.6-tip/kernel/hw_breakpoint.c
> @@ -0,0 +1,367 @@
...
> +struct task_struct *last_debugged_task;

Is this variable provided only for use by the hw_breakpoint_handler()
routine, for detecting lazy debug-register switching?  It won't work
right on SMP systems.  You need to use a per-CPU variable instead.

> +/*
> + * Install the debug register values for just the kernel, no thread.
> + */
> +void switch_to_none_hw_breakpoint(void)
> +{
> +	arch_install_none();
> +}

Even though "arch_install_none" was my own name, I don't like it very
much.  "arch_remove_user_hw_breakpoints" would be better.

> +/*
> + * Erase all the hardware breakpoint info associated with a thread.
> + *
> + * If tsk != current then tsk must not be usable (for example, a
> + * child being cleaned up from a failed fork).
> + */
> +void flush_thread_hw_breakpoint(struct task_struct *tsk)
> +{
> +	int i;
> +	struct thread_struct *thread = &(tsk->thread);
> +
> +	mutex_lock(&hw_breakpoint_mutex);
> +
> +	/* The thread no longer has any breakpoints associated with it */
> +	clear_tsk_thread_flag(tsk, TIF_DEBUG);
> +	for (i = 0; i < HB_NUM; i++) {
> +		if (thread->hbp[i]) {
> +			hbp_user_refcount[i]--;
> +			kfree(thread->hbp[i]);

Ugh!  In general you shouldn't deallocate memory you didn't allocate
originally.  What will happen when there is a utrace interface in
addition to the ptrace interface?

> +			thread->hbp[i] = NULL;
> +		}
> +	}
> +	thread->hbp_num_installed = 0;

This variable doesn't seem to serve any particularly useful purpose.
Eliminate it.

> +/*
> + * Validate the settings in a hw_breakpoint structure.
> + */
> +static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
> +{
> +	int ret;
> +	unsigned int align;
> +
> +	if (!bp)
> +		return -EINVAL;
> +
> +	ret = arch_validate_hwbkpt_settings(bp, &align, tsk);
> +	if (ret < 0)
> +		goto err;
> +
> +	/*
> +	 * Check that the low-order bits of the address are appropriate
> +	 * for the alignment implied by len.
> +	 */
> +	if (bp->info.address & align)
> +		return -EINVAL;

I sort of think this test belongs in the arch-specific code also.
After all, some types of CPU might not have alignment constraints.

> +/*
> + * Actual implementation of unregister_user_hw_breakpoint.
> + */
> +void __unregister_user_hw_breakpoint(int pos, struct task_struct *tsk,
> +						struct hw_breakpoint *bp)

What happened to unregister_user_hw_breakpoint?  It doesn't seem to
exist any more.

In general, will the caller know the value of pos?  Probably not,
unless the caller is ptrace.  It shouldn't be one of the parameters.

> +{
> +	struct thread_struct *thread = &(tsk->thread);
> +
> +	if (!bp)
> +		return;
> +
> +	hbp_user_refcount[pos]--;
> +	thread->hbp_num_installed--;
> +
> +	arch_unregister_user_hw_breakpoint(pos, bp, tsk);
> +
> +	if (tsk == current)
> +		switch_to_thread_hw_breakpoint(tsk);
> +	kfree(tsk->thread.hbp[pos]);

Once again, memory should be deallocated by the same module that
allocated it.

> +/**
> + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> + * @bp: the breakpoint structure to unregister
> + *
> + * Uninstalls and unregisters @bp.
> + */
> +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> +{
> +	int i, j;
> +
> +	mutex_lock(&hw_breakpoint_mutex);
> +
> +	/* Find the 'bp' in our list of breakpoints for kernel */
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> +		if (bp == hbp_kernel[i])
> +			break;

If you would store the register number in the arch-specific part of
struct hw_breakpoint then this loop wouldn't be needed.

> +	/*
> +	 * We'll shift the breakpoints one-level above to compact if
> +	 * unregistration creates a hole
> +	 */
> +	if (i > hbp_kernel_pos)
> +		for (j = i; j == hbp_kernel_pos; j--)
> +			hbp_kernel[j] = hbp_kernel[j-1];

The loop condition "j == hbp_kernel_pos" is wrong.  It should be
"j > hbp_kernel_pos".  And making this change shows that the preceding
"if" statement is redundant.

> +
> +	/*
> +	 * Delete the last kernel breakpoint entry after compaction and update
> +	 * the pointer to the lowest numbered kernel entry after updating its
> +	 * corresponding value in kdr7
> +	 */
> +	hbp_kernel[hbp_kernel_pos] = 0;
> +	arch_unregister_kernel_hw_breakpoint();

Even though it was part of my original design, there's no good reason
for making arch_register_kernel_hw_breakpoint and
arch_unregister_kernel_hw_breakpoint be separate functions.  There
should just be a single function: arch_update_kernel_hw_breakpoints.
The same is true for arch_update_user_hw_breakpoints.  In each case,
all that is needed is to recalculate the DR7 mask and value.

> +	hbp_kernel_pos++;

And this should be moved up one line, so that the arch-specific code
knows how many kernel breakpoints to register.

> +static int __init init_hw_breakpoint(void)
> +{
> +	int i;
> +
> +	hbp_kernel_pos = HB_NUM;
> +	for (i = 0; i < HB_NUM; i++)
> +		hbp_user_refcount[i] = 0;

This loop is unnecessary, since uninitialized static values are set to
0 in any case.

> +	load_debug_registers();

Hmm, I suspect this line can safely be omitted.

> +
> +	return register_die_notifier(&hw_breakpoint_exceptions_nb);
> +}


> +/*
> + * Install the kernel breakpoints in their debug registers.
> + * If 0 <= pos < HB_NUM, then set the debug register corresponding to that number
> + * If 'pos' is negative, then all debug registers are updated
> + */
> +void arch_install_kernel_hw_breakpoint(void *idx)

I don't like this design decision.  Why not simply install all the
kernel breakpoints every time?  The extra effort would be invisible
compared to the overhead of an IPI.

> +{
> +	int pos = *(int *)idx;
> +	unsigned long dr7;
> +	int i;
> +
> +	get_debugreg(dr7, 7);
> +
> +	/* Don't allow debug exceptions while we update the registers */
> +	set_debugreg(0UL, 7);
> +
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> +		if ((pos >= 0) && (i != pos))
> +			continue;
> +		dr7 &= ~(dr7_masks[i]);
> +		if (hbp_kernel[i])
> +			set_debugreg(hbp_kernel[i]->info.address, i);
> +	}

For example, this loop could be written more simply as follows:

	switch (hbp_kernel_pos) {
	case 0:
		set_debugreg(hbp_kernel[0]->info.address, 0);
	case 1:
		set_debugreg(hbp_kernel[1]->info.address, 1);
		...
	}

> +
> +	dr7 |= kdr7;

Of course, you would also have to mask out the user bits from DR7.
You could do something like this:

	dr7 = (dr7 & dr7_user_mask[hbp_kernel_pos]) | kdr7;

where dr7_user_mask is a static array containing the five appropriate
mask values.

> +	/* No need to set DR6 */
> +	set_debugreg(dr7, 7);
> +}
> +
> +void arch_load_debug_registers()
> +{
> +	int pos = -1;
> +	/*
> +	 * We want all debug registers to be initialised for this
> +	 * CPU so pos = -1
> +	 */
> +	arch_install_kernel_hw_breakpoint((void *)&pos);
> +}

If you follow my suggestion above then this routine isn't needed at
all.  Callers can invoke arch_install_kernel_hw_breakpoints instead.

> +/*
> + * Install the thread breakpoints in their debug registers.
> + */
> +void arch_install_thread_hw_breakpoint(struct task_struct *tsk)
> +{
> +	int i;
> +	struct thread_struct *thread = &(tsk->thread);
> +
> +	for (i = 0;  (i < hbp_kernel_pos) && hbp_user_refcount[i]; i++)
> +		if (thread->hbp[i])
> +			set_debugreg(thread->hbp[i]->info.address, i);

The loop condition is wrong.  But since this routine is on the hot
path we should avoid using a loop at all.  In fact, if the DR0-DR3
register values are added back into the thread structure, we could
simply do this:

	switch (hbp_kernel_pos) {
	case 4:
		set_debugreg(thread->dr3, 3);
	case 3:
		set_debugreg(thread->dr2, 2);
		...
	}

> +
> +	/* No need to set DR6 */
> +
> +	set_debugreg((kdr7 | thread->dr7), 7);
> +}

> +/*
> + * Check for virtual address in kernel space.
> + */
> +int arch_check_va_in_kernelspace(unsigned long va, u8 hbp_len)
> +{
> +	unsigned int len;
> +
> +	len = get_hbp_len(hbp_len);
> +
> +	return ((va >= TASK_SIZE) && ((va + len) >= TASK_SIZE));

In theory this should be (va + len - 1).

> +}
> +
> +/*
> + * Store a breakpoint's encoded address, length, and type.
> + */
> +void arch_store_info(struct hw_breakpoint *bp)
> +{
> +	/*
> +	 * User-space requests will always have the address field populated
> +	 * For kernel-addresses, either the address or symbol name can be
> +	 * specified.
> +	 */
> +	if (bp->info.address)
> +		return;
> +	if (bp->info.name)
> +		bp->info.address = (unsigned long)
> +					kallsyms_lookup_name(bp->info.name);
> +}

I still think the address and name fields shouldn't be arch-specific.
After all, won't _every_ arch need to have a copy of exactly this same
function?

> +/*
> + * Modify an existing user breakpoint structure.
> + */
> +int arch_modify_user_hw_breakpoint(int pos, struct hw_breakpoint *bp,
> +		struct task_struct *tsk)
> +{
> +	struct thread_struct *thread = &(tsk->thread);
> +
> +	/* Check if the register to be modified was enabled by the thread */
> +	if (!(thread->dr7 & (1 << (pos * DR_ENABLE_SIZE))))
> +		return -EINVAL;
> +
> +	thread->dr7 &= ~dr7_masks[pos];
> +	thread->dr7 |= encode_dr7(pos, bp->info.len, bp->info.type);
> +
> +	return 0;
> +}

It might be possible to eliminate this rather awkward code, once the
DR0-DR3 values are added back into the thread structure.

> +/*
> + * Copy out the debug register information for a core dump.
> + *
> + * tsk must be equal to current.
> + */
> +void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
> +{
> +	struct thread_struct *thread = &(tsk->thread);
> +	int i;
> +
> +	memset(u_debugreg, 0, sizeof u_debugreg);
> +	for (i = 0; i < thread->hbp_num_installed && thread->hbp[i]; ++i)
> +		u_debugreg[i] = thread->hbp[i]->info.address;

The loop condition is wrong, since you don't compact userspace
breakpoints.  But it could be unrolled into:

	u_debugreg[0] = thread->dr0;
	...
	u_debugreg[3] = thread->dr3;

> +	u_debugreg[7] = thread->dr7;
> +	u_debugreg[6] = thread->dr6;
> +}
> +
> +/*
> + * Handle debug exception notifications.
> + */
> +int __kprobes hw_breakpoint_handler(struct die_args *args)
> +{
> +	int i, rc = NOTIFY_DONE;
> +	struct hw_breakpoint *bp;
> +	/* The DR6 value is stored in args->err */
> +	unsigned long dr7, dr6 = args->err;

Please change this.  (I should have changed it long ago, but I never
got around to it.)  Instead of passing the DR6 value in args->err,
pass a pointer to the dr6 variable in do_debug().  That way the
handler routines can turn off bits in that variable and do_debug() can
see which bits remain set at the end.

Of course, this will require a corresponding change to the
post_kprobe_handler() routine.

> +
> +	if (dr6 & DR_STEP)
> +		return NOTIFY_DONE;

This test is wrong.  Why did you change it?  It should be:

	if (!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))

In theory it's possible to have both the Single-Step bit and a Debug-Trap
bit set at the same time.

> +
> +	get_debugreg(dr7, 7);
> +
> +	/* Disable breakpoints during exception handling */
> +	set_debugreg(0UL, 7);
> +
> +	/*
> +	 * Assert that local interrupts are disabled
> +	 * Reset the DRn bits in the virtualized register value.
> +	 * The ptrace trigger routine will add in whatever is needed.
> +	 */
> +	current->thread.dr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
> +
> +	/* Lazy debug register switching */
> +	if (last_debugged_task != current)
> +		switch_to_none_hw_breakpoint();
> +
> +	/* Handle all the breakpoints that were triggered */
> +	for (i = 0; i < HB_NUM; ++i) {
> +		if (likely(!(dr6 & (DR_TRAP0 << i))))
> +			continue;
> +		/*
> +		 * Find the corresponding hw_breakpoint structure and
> +		 * invoke its triggered callback.
> +		 */
> +		if (hbp_user_refcount[i])
> +			bp = current->thread.hbp[i];
> +		else if (i >= hbp_kernel_pos)
> +			bp = hbp_kernel[i];
> +		else	/* False alarm due to lazy DR switching */
> +			continue;
> +
> +		if (!bp)
> +			break;

This logic is wrong.  It should go like this:

		if (i >= hbp_kernel_pos)
			bp = hbp_kernel[i];
		else {
			bp = current->thread.hbp[i];
			if (!bp) {
				/* False alarm due to lazy DR switching */
				continue;
			}
		}

> +
> +		switch (bp->info.type) {
> +		case HW_BREAKPOINT_WRITE:
> +		case HW_BREAKPOINT_RW:
> +			if (bp->triggered)

Do you really need to test bp->triggered?

> +				(bp->triggered)(bp, args->regs);
> +
> +			if (arch_check_va_in_userspace(bp->info.address,
> +							bp->info.len))
> +				rc = NOTIFY_DONE;
> +			else
> +				rc = NOTIFY_STOP;;
> +			goto exit;

What on Earth is the reason for all this?  What happens if two
breakpoints get triggered at the same time?

> +		case HW_BREAKPOINT_EXECUTE:
> +			/*
> +			 * Presently we allow instruction breakpoints only in
> +			 * user-space when requested through ptrace.
> +			 */
> +			if (arch_check_va_in_userspace(bp->info.address,
> +							bp->info.len)) {
> +				(bp->triggered)(bp, args->regs);

Why do you need this test?

> +				/*
> +				 * do_debug will notify user through a SIGTRAP
> +				 * signal So we are not requesting a
> +				 * NOTIFY_STOP here
> +				 */
> +				rc = NOTIFY_DONE;
> +				goto exit;
> +			}
> +		}

In fact, why do you distinguish between data breakpoints and code
breakpoints in the first place?  Shouldn't they be treated the same?

> +	}
> +
> +	/* Stop processing further if the exception is a stray one */

That comment is wrong.  It should say something like this:

	/* Stop processing if there's nothing more to do */

> +	if (!(dr6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> +		rc = NOTIFY_STOP;

On the other hand, I'm not sure that this NOTIFY_STOP will help much
anyway.  All it does is provide an early exit from the notifier chain
when a hardware breakpoint occurs.  But if there wasn't also a
Single-Step exception, the kprobes handler shouldn't take long to
run.  Hence an early exit doesn't provide much advantage.

> +exit:
> +	set_debugreg(dr7, 7);
> +	return rc;
> +}


> @@ -530,13 +530,14 @@ asmlinkage __kprobes struct pt_regs *syn
>  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
>  {
>  	struct task_struct *tsk = current;
> -	unsigned long condition;
> +	unsigned long dr6;
>  	int si_code;
>  
> -	get_debugreg(condition, 6);
> +	get_debugreg(dr6, 6);
> +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
>  
>  	/* Catch kmemcheck conditions first of all! */
> -	if (condition & DR_STEP && kmemcheck_trap(regs))
> +	if (dr6 & DR_STEP && kmemcheck_trap(regs))
>  		return;

Are you sure this is right?  Is it possible for any of the DR_TRAPn bits
to be set as well when this happens?


> @@ -83,6 +85,8 @@ void exit_thread(void)
>  		put_cpu();
>  		kfree(bp);
>  	}
> +	if (unlikely(t->dr7))
> +		flush_thread_hw_breakpoint(me);

Shouldn't you test the TIF_DEBUG flag instead?  After all, the thread
might very well have some hw_breakpoint structures allocated even though
t->dr7 is 0.

>  
>  	ds_exit_thread(current);
>  }
> @@ -103,14 +107,9 @@ void flush_thread(void)
>  	}
>  #endif
>  
> -	clear_tsk_thread_flag(tsk, TIF_DEBUG);
> +	if (unlikely(tsk->thread.dr7))
> +		flush_thread_hw_breakpoint(tsk);

Same thing here.

> @@ -265,7 +267,14 @@ int copy_thread(int nr, unsigned long cl
>  
>  	task_user_gs(p) = get_user_gs(regs);
>  
> +	p->thread.io_bitmap_ptr = NULL;
> +
>  	tsk = current;
> +	err = -ENOMEM;
> +	if (unlikely(tsk->thread.dr7)) {
> +		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
> +			goto out;
> +	}

And here.

> @@ -426,6 +438,25 @@ __switch_to(struct task_struct *prev_p, 
>  		lazy_load_gs(next->gs);
>  
>  	percpu_write(current_task, next_p);
> +	/*
> +	 * There's a problem with moving the switch_to_thread_hw_breakpoint()
> +	 * call before current is updated.  Suppose a kernel breakpoint is
> +	 * triggered in between the two.  The hw-breakpoint handler will see
> +	 * that current is different from the task pointer stored in the chbi
> +	 * area, so it will think the task pointer is leftover from an old task
> +	 * (lazy switching) and will erase it.  Then until the next context
> +	 * switch, no user-breakpoints will be installed.
> +	 *
> +	 * The real problem is that it's impossible to update both current and
> +	 * chbi->bp_task at the same instant, so there will always be a window
> +	 * in which they disagree and a breakpoint might get triggered.  Since
> +	 * we use lazy switching, we are forced to assume that a disagreement
> +	 * means that current is correct and chbi->bp_task is old.  But if you
> +	 * move the code above then you'll create a window in which current is
> +	 * old and chbi->bp_task is correct.
> +	 */

Don't you think this comment should be updated to match the changes
you have made in the code?  There no longer is a chbi area, for example.


Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-03-25 19:48 ` Alan Stern
@ 2009-03-27 22:06   ` K.Prasad
  2009-04-01 16:16     ` Alan Stern
  2009-03-28  8:46   ` K.Prasad
  1 sibling, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-03-27 22:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Wed, Mar 25, 2009 at 03:48:31PM -0400, Alan Stern wrote:

Hi Alan,
Thanks for the thorough review. It has helped uncover bugs that might
have been discovered after much delay. Please find my responses inline.

> There are some serious issues involving userspace breakpoints and the
> legacy ptrace interface.  It all comes down to this: At what point
> is a breakpoint registered for a ptrace caller?
> 
> Remember, to set up a breakpoint a debugger needs to call ptrace
> twice: once to put the address in one of the DR0-DR3 registers and
> once to set up DR7.  So when does the task own the breakpoint?
> 
> Logically, we should wait until DR7 gets set, because until then the
> breakpoint is not active.  But then how do we let the caller know that
> one of his breakpoints conflicts with a kernel breakpoint?
> 
> If we report an error during an attempt to set DR0-DR3 then at least
> it's unambiguous.  But then how do we know when the task is _finished_
> using the breakpoint?  It's under no obligation to set the register
> back to 0.
> 
> Related to this is the question of how to store the task's versions of
> DR0-DR3 when there is no associated active breakpoint.  Maybe it would
> be best to keep the existing registers in the thread structure.
> 

These are profound questions and I believe that it is upto the process in
user-space to answer them.

What we could ensure from the kernel-space is to retain the
existing behaviour of ptrace i.e. return success when a write is done on
DR0-DR3 (almost unconditionally) and reserve error returns only when DR7
is written into.

The patch in question could possibly return an -ENOMEM (even when write
is done on DR0-DR3) but I will change the behaviour as stated above.


A DR0 - DR3 return will do a:
	thread->debugreg[n] = val;
	return 0;

while all error returns are reserved for:
	rc = ptrace_write_dr7(tsk, val);

> > +++ linux-2.6-tip/kernel/hw_breakpoint.c
> > @@ -0,0 +1,367 @@
> ...
> > +struct task_struct *last_debugged_task;
> 
> Is this variable provided only for use by the hw_breakpoint_handler()
> routine, for detecting lazy debug-register switching?  It won't work
> right on SMP systems.  You need to use a per-CPU variable instead.
> 

Thanks for pointing it out. Here's what it will be made:
DEFINE_PER_CPU(struct task_struct *, last_debugged_task);

That also re-introduces the put_cpu_no_sched() into
switch_to_thread_hw_breakpoint() function.

void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
{
	/* Set the debug register */
	arch_install_thread_hw_breakpoint(tsk);
	per_cpu(last_debugged_task, get_cpu()) = current;
	put_cpu_no_resched();
}

> > +/*
> > + * Install the debug register values for just the kernel, no thread.
> > + */
> > +void switch_to_none_hw_breakpoint(void)
> > +{
> > +	arch_install_none();
> > +}
> 
> Even though "arch_install_none" was my own name, I don't like it very
> much.  "arch_remove_user_hw_breakpoints" would be better.
> 

How about arch_uninstall_thread_hw_breakpoint()? (Being the opposite
of arch_install_thread_hw_breakpoint()).

> > +/*
> > + * Erase all the hardware breakpoint info associated with a thread.
> > + *
> > + * If tsk != current then tsk must not be usable (for example, a
> > + * child being cleaned up from a failed fork).
> > + */
> > +void flush_thread_hw_breakpoint(struct task_struct *tsk)
> > +{
> > +	int i;
> > +	struct thread_struct *thread = &(tsk->thread);
> > +
> > +	mutex_lock(&hw_breakpoint_mutex);
> > +
> > +	/* The thread no longer has any breakpoints associated with it */
> > +	clear_tsk_thread_flag(tsk, TIF_DEBUG);
> > +	for (i = 0; i < HB_NUM; i++) {
> > +		if (thread->hbp[i]) {
> > +			hbp_user_refcount[i]--;
> > +			kfree(thread->hbp[i]);
> 
> Ugh!  In general you shouldn't deallocate memory you didn't allocate
> originally.  What will happen when there is a utrace interface in
> addition to the ptrace interface?
> 

I can't see how I can invoke ptrace related code here to free memory
here, although I agree that __unregister_user_hw_breakpoint() code need not
mess with it.
I will retain the kfree() in flush_thread_hw_breakpoint(), but remove it move 
it from the latter to ptrace related code.

> > +			thread->hbp[i] = NULL;
> > +		}
> > +	}
> > +	thread->hbp_num_installed = 0;
> 
> This variable doesn't seem to serve any particularly useful purpose.
> Eliminate it.
> 

Done.

> > +/*
> > + * Validate the settings in a hw_breakpoint structure.
> > + */
> > +static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
> > +{
> > +	int ret;
> > +	unsigned int align;
> > +
> > +	if (!bp)
> > +		return -EINVAL;
> > +
> > +	ret = arch_validate_hwbkpt_settings(bp, &align, tsk);
> > +	if (ret < 0)
> > +		goto err;
> > +
> > +	/*
> > +	 * Check that the low-order bits of the address are appropriate
> > +	 * for the alignment implied by len.
> > +	 */
> > +	if (bp->info.address & align)
> > +		return -EINVAL;
> 
> I sort of think this test belongs in the arch-specific code also.
> After all, some types of CPU might not have alignment constraints.
> 

Moved. It also helps eliminate passing a parameter back and forth.

> > +/*
> > + * Actual implementation of unregister_user_hw_breakpoint.
> > + */
> > +void __unregister_user_hw_breakpoint(int pos, struct task_struct *tsk,
> > +						struct hw_breakpoint *bp)
> 
> What happened to unregister_user_hw_breakpoint?  It doesn't seem to
> exist any more.
> 

I thought it would be more convenient to introduce them after we have
virtualised debug registers. But thinking further, I think we can adopt
a simple 'first-fit' approach can be used to allocate debug registers
for user-space too. I will include a
(un)register_user_hw_breakpoint(task, hw_breakpoint)
in the next iteration and export them too.

> In general, will the caller know the value of pos?  Probably not,
> unless the caller is ptrace.  It shouldn't be one of the parameters.
> 

Given that ptrace contains the debugreg number, I chose to use it. The
proposed interfaces (as discussed above) will help users who cannot
provide the info.

> > +{
> > +	struct thread_struct *thread = &(tsk->thread);
> > +
> > +	if (!bp)
> > +		return;
> > +
> > +	hbp_user_refcount[pos]--;
> > +	thread->hbp_num_installed--;
> > +
> > +	arch_unregister_user_hw_breakpoint(pos, bp, tsk);
> > +
> > +	if (tsk == current)
> > +		switch_to_thread_hw_breakpoint(tsk);
> > +	kfree(tsk->thread.hbp[pos]);
> 
> Once again, memory should be deallocated by the same module that
> allocated it.
> 

Moved to ptrace_write_dr7() where __unregister_user_hw_breakpoint() is
invoked.

> > +/**
> > + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> > + * @bp: the breakpoint structure to unregister
> > + *
> > + * Uninstalls and unregisters @bp.
> > + */
> > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > +{
> > +	int i, j;
> > +
> > +	mutex_lock(&hw_breakpoint_mutex);
> > +
> > +	/* Find the 'bp' in our list of breakpoints for kernel */
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > +		if (bp == hbp_kernel[i])
> > +			break;
> 
> If you would store the register number in the arch-specific part of
> struct hw_breakpoint then this loop wouldn't be needed.
> 

It's just a tiny loop (theoretical max is HB_NUM). It could save a member in 
'struct hw_breakpoint' - a significant saving if there are
multiple number of threads that use debug registers.

The code is already guilty of storing the address of the breakpoint
twice i.e. once in thread->hbp->info.address and again in
thread->debugreg[n]. Adding this would be another. What do you say?

> > +	/*
> > +	 * We'll shift the breakpoints one-level above to compact if
> > +	 * unregistration creates a hole
> > +	 */
> > +	if (i > hbp_kernel_pos)
> > +		for (j = i; j == hbp_kernel_pos; j--)
> > +			hbp_kernel[j] = hbp_kernel[j-1];
> 
> The loop condition "j == hbp_kernel_pos" is wrong.  It should be
> "j > hbp_kernel_pos".  And making this change shows that the preceding
> "if" statement is redundant.
> 

I missed it, will correct.

> > +
> > +	/*
> > +	 * Delete the last kernel breakpoint entry after compaction and update
> > +	 * the pointer to the lowest numbered kernel entry after updating its
> > +	 * corresponding value in kdr7
> > +	 */
> > +	hbp_kernel[hbp_kernel_pos] = 0;
> > +	arch_unregister_kernel_hw_breakpoint();
> 
> Even though it was part of my original design, there's no good reason
> for making arch_register_kernel_hw_breakpoint and
> arch_unregister_kernel_hw_breakpoint be separate functions.  There
> should just be a single function: arch_update_kernel_hw_breakpoints.
> The same is true for arch_update_user_hw_breakpoints.  In each case,
> all that is needed is to recalculate the DR7 mask and value.
> 

This and a few other suggestions below can be taken only if we chose to
update all kernel related breakpoint registers, irrespective of the
change. In return for saving a few lines of code (+simpler code +
increased readability) we should take some runtime overhead during
(un)register_<> calls.

I'm not sure about the overhead of processing an IPI (which you've cited
as being much larger than the actual code being executed), but a little
reluctant to remove code that is tuned for more specific tasks. Consider
a large system where the number of CPUs is huge (say three digits or so),
and we want to install a breakpoint for the last register hb_num. It would
invoke a  write on all hb_num-1 registers for 'n' CPUs. I'm not sure if it's
worthwhile for saving a few lines of code.

I can change it if you insist. Let me know what you think.

> > +	hbp_kernel_pos++;
> 
> And this should be moved up one line, so that the arch-specific code
> knows how many kernel breakpoints to register.
> 

This is done after invoking arch_unregister_kernel_hw_breakpoint() just
so that the corresponding values in kdr7 are updated.

> > +static int __init init_hw_breakpoint(void)
> > +{
> > +	int i;
> > +
> > +	hbp_kernel_pos = HB_NUM;
> > +	for (i = 0; i < HB_NUM; i++)
> > +		hbp_user_refcount[i] = 0;
> 
> This loop is unnecessary, since uninitialized static values are set to
> 0 in any case.
> 
> > +	load_debug_registers();
> 
> Hmm, I suspect this line can safely be omitted.
> 

Done.

> > +
> > +	return register_die_notifier(&hw_breakpoint_exceptions_nb);
> > +}
> 
> 
> > +/*
> > + * Install the kernel breakpoints in their debug registers.
> > + * If 0 <= pos < HB_NUM, then set the debug register corresponding to that number
> > + * If 'pos' is negative, then all debug registers are updated
> > + */
> > +void arch_install_kernel_hw_breakpoint(void *idx)
> 
> I don't like this design decision.  Why not simply install all the
> kernel breakpoints every time?  The extra effort would be invisible
> compared to the overhead of an IPI.
> 

Response as above.

> > +{
> > +	int pos = *(int *)idx;
> > +	unsigned long dr7;
> > +	int i;
> > +
> > +	get_debugreg(dr7, 7);
> > +
> > +	/* Don't allow debug exceptions while we update the registers */
> > +	set_debugreg(0UL, 7);
> > +
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> > +		if ((pos >= 0) && (i != pos))
> > +			continue;
> > +		dr7 &= ~(dr7_masks[i]);
> > +		if (hbp_kernel[i])
> > +			set_debugreg(hbp_kernel[i]->info.address, i);
> > +	}
> 
> For example, this loop could be written more simply as follows:
> 
> 	switch (hbp_kernel_pos) {
> 	case 0:
> 		set_debugreg(hbp_kernel[0]->info.address, 0);
> 	case 1:
> 		set_debugreg(hbp_kernel[1]->info.address, 1);
> 		...
> 	}
> 
With above code
i)it uses fewer lines of code ii)Although when coding is completely
done, hbp_kernel[i] cannot be NULL here, we have a check for the same
just in case. It helped me several times during the course of development
to have the check as above and prevent crashes.

> > +
> > +	dr7 |= kdr7;
> 
> Of course, you would also have to mask out the user bits from DR7.
> You could do something like this:
> 
> 	dr7 = (dr7 & dr7_user_mask[hbp_kernel_pos]) | kdr7;
> 
> where dr7_user_mask is a static array containing the five appropriate
> mask values.
> 

These are functionally equivalent changes and hope you wouldn't mind if
I choose to skip them.

> > +	/* No need to set DR6 */
> > +	set_debugreg(dr7, 7);
> > +}
> > +
> > +void arch_load_debug_registers()
> > +{
> > +	int pos = -1;
> > +	/*
> > +	 * We want all debug registers to be initialised for this
> > +	 * CPU so pos = -1
> > +	 */
> > +	arch_install_kernel_hw_breakpoint((void *)&pos);
> > +}
> 
> If you follow my suggestion above then this routine isn't needed at
> all.  Callers can invoke arch_install_kernel_hw_breakpoints instead.
> 

Response as earlier.

> > +/*
> > + * Install the thread breakpoints in their debug registers.
> > + */
> > +void arch_install_thread_hw_breakpoint(struct task_struct *tsk)
> > +{
> > +	int i;
> > +	struct thread_struct *thread = &(tsk->thread);
> > +
> > +	for (i = 0;  (i < hbp_kernel_pos) && hbp_user_refcount[i]; i++)
> > +		if (thread->hbp[i])
> > +			set_debugreg(thread->hbp[i]->info.address, i);
> 
> The loop condition is wrong.  But since this routine is on the hot
> path we should avoid using a loop at all.  In fact, if the DR0-DR3
> register values are added back into the thread structure, we could
> simply do this:
> 
> 	switch (hbp_kernel_pos) {
> 	case 4:
> 		set_debugreg(thread->dr3, 3);
> 	case 3:
> 		set_debugreg(thread->dr2, 2);
> 		...
> 	}
> 

The above loop will now become (after inclusion of debug registers in
thread_struct), with fewer indirections.

	for (i = 0;  i < hbp_kernel_pos; i++)
		if (thread->hbp[i])
			set_debugreg(thread->debugreg[i], i);

It is better because i)contains fewer lines of code compared to a switch case
ii)doesn't write onto 'dont-care' debug registers iii)If considered an
overhead, the compiler can always unroll the loop for optimisation.

Will change if you insist.

> > +
> > +	/* No need to set DR6 */
> > +
> > +	set_debugreg((kdr7 | thread->dr7), 7);
> > +}
> 
> > +/*
> > + * Check for virtual address in kernel space.
> > + */
> > +int arch_check_va_in_kernelspace(unsigned long va, u8 hbp_len)
> > +{
> > +	unsigned int len;
> > +
> > +	len = get_hbp_len(hbp_len);
> > +
> > +	return ((va >= TASK_SIZE) && ((va + len) >= TASK_SIZE));
> 
> In theory this should be (va + len - 1).
>

You mean check for?
return ((va >= TASK_SIZE) && ((va + (len - 1)) >= TASK_SIZE));

> > +}
> > +
> > +/*
> > + * Store a breakpoint's encoded address, length, and type.
> > + */
> > +void arch_store_info(struct hw_breakpoint *bp)
> > +{
> > +	/*
> > +	 * User-space requests will always have the address field populated
> > +	 * For kernel-addresses, either the address or symbol name can be
> > +	 * specified.
> > +	 */
> > +	if (bp->info.address)
> > +		return;
> > +	if (bp->info.name)
> > +		bp->info.address = (unsigned long)
> > +					kallsyms_lookup_name(bp->info.name);
> > +}
> 
> I still think the address and name fields shouldn't be arch-specific.
> After all, won't _every_ arch need to have a copy of exactly this same
> function?
> 

It is the thought of having breakpoints for I/O (which cannot possibly
have a name) and breakpoints over a range of addresses (unlike having
just one address field) that makes me think that these are best kept in
arch-specific structure.

If at a later point in time, they appear on all arch-specific structures
(that have an implementation), I will move the fields into generic
structure.

> > +/*
> > + * Modify an existing user breakpoint structure.
> > + */
> > +int arch_modify_user_hw_breakpoint(int pos, struct hw_breakpoint *bp,
> > +		struct task_struct *tsk)
> > +{
> > +	struct thread_struct *thread = &(tsk->thread);
> > +
> > +	/* Check if the register to be modified was enabled by the thread */
> > +	if (!(thread->dr7 & (1 << (pos * DR_ENABLE_SIZE))))
> > +		return -EINVAL;
> > +
> > +	thread->dr7 &= ~dr7_masks[pos];
> > +	thread->dr7 |= encode_dr7(pos, bp->info.len, bp->info.type);
> > +
> > +	return 0;
> > +}
> 
> It might be possible to eliminate this rather awkward code, once the
> DR0-DR3 values are added back into the thread structure.
> 

I'm sorry I don't see how. Can you explain?

> > +/*
> > + * Copy out the debug register information for a core dump.
> > + *
> > + * tsk must be equal to current.
> > + */
> > +void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
> > +{
> > +	struct thread_struct *thread = &(tsk->thread);
> > +	int i;
> > +
> > +	memset(u_debugreg, 0, sizeof u_debugreg);
> > +	for (i = 0; i < thread->hbp_num_installed && thread->hbp[i]; ++i)
> > +		u_debugreg[i] = thread->hbp[i]->info.address;
> 
> The loop condition is wrong, since you don't compact userspace
> breakpoints.  But it could be unrolled into:
> 
> 	u_debugreg[0] = thread->dr0;
> 	...
> 	u_debugreg[3] = thread->dr3;
> 

I agree that some of the code in the patch were based on the assumption
that the registers by user-space users would be consumed in an
increasing fashion, but it should be changed.

The above code will become:
	for (i = 0; i < HB_NUM; ++i)
		if (thread->hbp[i])
			u_debugreg[i] = thread->debugreg[i];

Also note that "unsinged long debugreg[HB_NUM]" is embedded in
thread_struct and not as shown below for using them in loops
conveniently.

unsigned long debugreg0;
unsigned long debugreg1;
...

> > +	u_debugreg[7] = thread->dr7;
> > +	u_debugreg[6] = thread->dr6;
> > +}
> > +
> > +/*
> > + * Handle debug exception notifications.
> > + */
> > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > +{
> > +	int i, rc = NOTIFY_DONE;
> > +	struct hw_breakpoint *bp;
> > +	/* The DR6 value is stored in args->err */
> > +	unsigned long dr7, dr6 = args->err;
> 
> Please change this.  (I should have changed it long ago, but I never
> got around to it.)  Instead of passing the DR6 value in args->err,
> pass a pointer to the dr6 variable in do_debug().  That way the
> handler routines can turn off bits in that variable and do_debug() can
> see which bits remain set at the end.
> 
> Of course, this will require a corresponding change to the
> post_kprobe_handler() routine.
> 

Ok.

> > +
> > +	if (dr6 & DR_STEP)
> > +		return NOTIFY_DONE;
> 
> This test is wrong.  Why did you change it?  It should be:
> 
> 	if (!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> 
> In theory it's possible to have both the Single-Step bit and a Debug-Trap
> bit set at the same time.
> 

This code is in hw_breakpoint_handler() which we don't intend to enter 
if single-stepping bit is set (through kprobes) and hence the
NOTIFY_DONE. Given that HW_BREAKPOINT_EXECUTE is not
allowed over kernel-space addresses, we cannot have a kprobe and a HW
breakpoint over the same address causing simultaneous exceptions.

However when the patch once had support for Instructions breakpoints + 
post_handler(), it was a different case then.

Is there a reason why you think this check and/or return condition
should be different?

> > +
> > +	get_debugreg(dr7, 7);
> > +
> > +	/* Disable breakpoints during exception handling */
> > +	set_debugreg(0UL, 7);
> > +
> > +	/*
> > +	 * Assert that local interrupts are disabled
> > +	 * Reset the DRn bits in the virtualized register value.
> > +	 * The ptrace trigger routine will add in whatever is needed.
> > +	 */
> > +	current->thread.dr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
> > +
> > +	/* Lazy debug register switching */
> > +	if (last_debugged_task != current)
> > +		switch_to_none_hw_breakpoint();
> > +
> > +	/* Handle all the breakpoints that were triggered */
> > +	for (i = 0; i < HB_NUM; ++i) {
> > +		if (likely(!(dr6 & (DR_TRAP0 << i))))
> > +			continue;
> > +		/*
> > +		 * Find the corresponding hw_breakpoint structure and
> > +		 * invoke its triggered callback.
> > +		 */
> > +		if (hbp_user_refcount[i])
> > +			bp = current->thread.hbp[i];
> > +		else if (i >= hbp_kernel_pos)
> > +			bp = hbp_kernel[i];
> > +		else	/* False alarm due to lazy DR switching */
> > +			continue;
> > +
> > +		if (!bp)
> > +			break;
> 
> This logic is wrong.  It should go like this:
> 
> 		if (i >= hbp_kernel_pos)
> 			bp = hbp_kernel[i];
> 		else {
> 			bp = current->thread.hbp[i];
> 			if (!bp) {
> 				/* False alarm due to lazy DR switching */
> 				continue;
> 			}
> 		}
> 

I agree. The 'break' following the "if (!bp)" in my patch would have ignored a
few genuine exceptions and it should have been:
		if (!bp)
			continue;

Your suggested code is elegant and I'll adopt it.

> > +
> > +		switch (bp->info.type) {
> > +		case HW_BREAKPOINT_WRITE:
> > +		case HW_BREAKPOINT_RW:
> > +			if (bp->triggered)
> 
> Do you really need to test bp->triggered?
> 

It's a carriage from old code. Will remove.

> > +				(bp->triggered)(bp, args->regs);
> > +
> > +			if (arch_check_va_in_userspace(bp->info.address,
> > +							bp->info.len))
> > +				rc = NOTIFY_DONE;
> > +			else
> > +				rc = NOTIFY_STOP;;
> > +			goto exit;
> 
> What on Earth is the reason for all this?  What happens if two
> breakpoints get triggered at the same time?
> 

The hw_breakpoint_handler() will be modified like this:
(without the modifications to dr6). Note that the 'goto exit' has
changed to 'continue' to allow handling of multiple exceptions.

int __kprobes hw_breakpoint_handler(struct die_args *args)
{
	int i, rc = NOTIFY_DONE;
	struct hw_breakpoint *bp;
	/* The DR6 value is stored in args->err */
	unsigned long dr7, dr6 = args->err;

	if (dr6 & DR_STEP)
		return NOTIFY_DONE;

	get_debugreg(dr7, 7);

	/* Disable breakpoints during exception handling */
	set_debugreg(0UL, 7);

	/*
	 * Assert that local interrupts are disabled
	 * Reset the DRn bits in the virtualized register value.
	 * The ptrace trigger routine will add in whatever is needed.
	 */
	current->thread.debugreg6 &= \
			~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);

	/* Lazy debug register switching */
	if (per_cpu(last_debugged_task, get_cpu()) != current) {
		switch_to_none_hw_breakpoint();
		put_cpu_no_resched();
	}

	/* Handle all the breakpoints that were triggered */
	for (i = 0; i < HB_NUM; ++i) {
		if (likely(!(dr6 & (DR_TRAP0 << i))))
			continue;
		/*
		 * Find the corresponding hw_breakpoint structure and
		 * invoke its triggered callback.
		 */
		if (i >= hbp_kernel_pos)
			bp = hbp_kernel[i];
		else {
			bp = current->thread.hbp[i];
			if (!bp) {
				/* False alarm due to lazy DR switching */
				continue;
			}
		}

		switch (bp->info.type) {
		case HW_BREAKPOINT_WRITE:
		case HW_BREAKPOINT_RW:
			(bp->triggered)(bp, args->regs);

			if (arch_check_va_in_userspace(bp->info.address,
							bp->info.len))
				rc = NOTIFY_DONE;
			else
				rc = NOTIFY_STOP;;
			continue;
		case HW_BREAKPOINT_EXECUTE:
			/*
			 * Presently we allow instruction breakpoints
			 * only in
			 * user-space when requested through ptrace.
			 */
			if (arch_check_va_in_userspace(bp->info.address,
							bp->info.len)) {
				(bp->triggered)(bp, args->regs);
				/*
				 * do_debug will notify user through a
				 * SIGTRAP
				 * signal. So we are not requesting a
				 * NOTIFY_STOP here
				 */
				rc = NOTIFY_DONE;
				continue;
			}
		}
	}

	set_debugreg(dr7, 7);
	return rc;
}

> > +		case HW_BREAKPOINT_EXECUTE:
> > +			/*
> > +			 * Presently we allow instruction breakpoints only in
> > +			 * user-space when requested through ptrace.
> > +			 */
> > +			if (arch_check_va_in_userspace(bp->info.address,
> > +							bp->info.len)) {
> > +				(bp->triggered)(bp, args->regs);
> 
> Why do you need this test?
> 
> > +				/*
> > +				 * do_debug will notify user through a SIGTRAP
> > +				 * signal So we are not requesting a
> > +				 * NOTIFY_STOP here
> > +				 */
> > +				rc = NOTIFY_DONE;
> > +				goto exit;
> > +			}
> > +		}
> 
> In fact, why do you distinguish between data breakpoints and code
> breakpoints in the first place?  Shouldn't they be treated the same?
> 

We would receive breakpoint exceptions with type HW_BREAKPOINT_EXECUTE
only for user-space (through ptrace) and this is a double-check (the
first one being done at arch_validate_hwbkpt_settings(). Also, we don't
want to invoke bp->triggered (which would be ptrace_triggered) without
checking if the causative IP is in user-space. Hence the above code.

> > +	}
> > +
> > +	/* Stop processing further if the exception is a stray one */
> 
> That comment is wrong.  It should say something like this:
> 
> 	/* Stop processing if there's nothing more to do */
> 
> > +	if (!(dr6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> > +		rc = NOTIFY_STOP;
> 
> On the other hand, I'm not sure that this NOTIFY_STOP will help much
> anyway.  All it does is provide an early exit from the notifier chain
> when a hardware breakpoint occurs.  But if there wasn't also a
> Single-Step exception, the kprobes handler shouldn't take long to
> run.  Hence an early exit doesn't provide much advantage.
> 

Yes, I'm for removing this check too.

> > +exit:
> > +	set_debugreg(dr7, 7);
> > +	return rc;
> > +}
> 
> 
> > @@ -530,13 +530,14 @@ asmlinkage __kprobes struct pt_regs *syn
> >  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> >  {
> >  	struct task_struct *tsk = current;
> > -	unsigned long condition;
> > +	unsigned long dr6;
> >  	int si_code;
> >  
> > -	get_debugreg(condition, 6);
> > +	get_debugreg(dr6, 6);
> > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> >  
> >  	/* Catch kmemcheck conditions first of all! */
> > -	if (condition & DR_STEP && kmemcheck_trap(regs))
> > +	if (dr6 & DR_STEP && kmemcheck_trap(regs))
> >  		return;
> 
> Are you sure this is right?  Is it possible for any of the DR_TRAPn bits
> to be set as well when this happens?
> 
> 

I did not look at this check before. But the (dr6 & DR_STEP) condition
should make sure no HW breakpoint exceptions are set (since we don't
allow instruction breakpoints in kernel-space yet, as explained above).

> > @@ -83,6 +85,8 @@ void exit_thread(void)
> >  		put_cpu();
> >  		kfree(bp);
> >  	}
> > +	if (unlikely(t->dr7))
> > +		flush_thread_hw_breakpoint(me);
> 
> Shouldn't you test the TIF_DEBUG flag instead?  After all, the thread
> might very well have some hw_breakpoint structures allocated even though
> t->dr7 is 0.
> 

Yes, it's a mistake and missed many eyes before. Thanks for pointing it
out!

> >  
> >  	ds_exit_thread(current);
> >  }
> > @@ -103,14 +107,9 @@ void flush_thread(void)
> >  	}
> >  #endif
> >  
> > -	clear_tsk_thread_flag(tsk, TIF_DEBUG);
> > +	if (unlikely(tsk->thread.dr7))
> > +		flush_thread_hw_breakpoint(tsk);
> 
> Same thing here.
> 
> > @@ -265,7 +267,14 @@ int copy_thread(int nr, unsigned long cl
> >  
> >  	task_user_gs(p) = get_user_gs(regs);
> >  
> > +	p->thread.io_bitmap_ptr = NULL;
> > +
> >  	tsk = current;
> > +	err = -ENOMEM;
> > +	if (unlikely(tsk->thread.dr7)) {
> > +		if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
> > +			goto out;
> > +	}
> 
> And here.
> 
> > @@ -426,6 +438,25 @@ __switch_to(struct task_struct *prev_p, 
> >  		lazy_load_gs(next->gs);
> >  
> >  	percpu_write(current_task, next_p);
> > +	/*
> > +	 * There's a problem with moving the switch_to_thread_hw_breakpoint()
> > +	 * call before current is updated.  Suppose a kernel breakpoint is
> > +	 * triggered in between the two.  The hw-breakpoint handler will see
> > +	 * that current is different from the task pointer stored in the chbi
> > +	 * area, so it will think the task pointer is leftover from an old task
> > +	 * (lazy switching) and will erase it.  Then until the next context
> > +	 * switch, no user-breakpoints will be installed.
> > +	 *
> > +	 * The real problem is that it's impossible to update both current and
> > +	 * chbi->bp_task at the same instant, so there will always be a window
> > +	 * in which they disagree and a breakpoint might get triggered.  Since
> > +	 * we use lazy switching, we are forced to assume that a disagreement
> > +	 * means that current is correct and chbi->bp_task is old.  But if you
> > +	 * move the code above then you'll create a window in which current is
> > +	 * old and chbi->bp_task is correct.
> > +	 */
> 
> Don't you think this comment should be updated to match the changes
> you have made in the code?  There no longer is a chbi area, for example.
> 
> 
> Alan Stern
> 

Will read like this:
	/*
	 * There's a problem with moving the
	 * switch_to_thread_hw_breakpoint()
	 * call before current is updated.  Suppose a kernel breakpoint
	 * is
	 * triggered in between the two.  The hw-breakpoint handler will
	 * see
	 * that current is different from the task pointer stored in
	 * last_debugged_task, so it will think the task pointer is
	 * leftover
	 * from an old task (lazy switching) and will erase it. Then
	 * until the
	 * next context switch, no user-breakpoints will be installed.
	 *
	 * The real problem is that it's impossible to update both
	 * current and
	 * last_debugged_task at the same instant, so there will always
	 * be a
	 * window in which they disagree and a breakpoint might get
	 * triggered.
	 * Since we use lazy switching, we are forced to assume that a
	 * disagreement means that current is correct and
	 * last_debugged_task is
	 * old.  But if you move the code above then you'll create a
	 * window in
	 * which current is old and last_debugged_task is correct.
	 */
	if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
		switch_to_thread_hw_breakpoint(next_p);


Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-03-25 19:48 ` Alan Stern
  2009-03-27 22:06   ` K.Prasad
@ 2009-03-28  8:46   ` K.Prasad
  2009-04-01 16:22     ` Alan Stern
  1 sibling, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-03-28  8:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Wed, Mar 25, 2009 at 03:48:31PM -0400, Alan Stern wrote:
> > +
> > +/*
> > + * Handle debug exception notifications.
> > + */
> > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > +{
> > +	int i, rc = NOTIFY_DONE;
> > +	struct hw_breakpoint *bp;
> > +	/* The DR6 value is stored in args->err */
> > +	unsigned long dr7, dr6 = args->err;
> 
> Please change this.  (I should have changed it long ago, but I never
> got around to it.)  Instead of passing the DR6 value in args->err,
> pass a pointer to the dr6 variable in do_debug().  That way the
> handler routines can turn off bits in that variable and do_debug() can
> see which bits remain set at the end.
> 
> Of course, this will require a corresponding change to the
> post_kprobe_handler() routine.
>

As I looked at the code with an intention of changing it, I don't find a
place - in hw_breakpoint_handler() and in functions called by
kprobe_exceptions_notify() where bits in dr6 are written into.
The thread-specific thread->debugreg6 is updated with causative bits in
ptrace_triggered() to help send signals to the user-space. I don't see a
user for the change you propose.

I should send out the revised patchset sometime tomorrow. Kindly let me
know your comments about them.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-03-27 22:06   ` K.Prasad
@ 2009-04-01 16:16     ` Alan Stern
  2009-04-07  8:22       ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-04-01 16:16 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

Sorry for not replying sooner; I was away on a short vacation.

On Sat, 28 Mar 2009, K.Prasad wrote:

> > There are some serious issues involving userspace breakpoints and the
> > legacy ptrace interface.  It all comes down to this: At what point
> > is a breakpoint registered for a ptrace caller?
> > 
> > Remember, to set up a breakpoint a debugger needs to call ptrace
> > twice: once to put the address in one of the DR0-DR3 registers and
> > once to set up DR7.  So when does the task own the breakpoint?
> > 
> > Logically, we should wait until DR7 gets set, because until then the
> > breakpoint is not active.  But then how do we let the caller know that
> > one of his breakpoints conflicts with a kernel breakpoint?
> > 
> > If we report an error during an attempt to set DR0-DR3 then at least
> > it's unambiguous.  But then how do we know when the task is _finished_
> > using the breakpoint?  It's under no obligation to set the register
> > back to 0.
> > 
> > Related to this is the question of how to store the task's versions of
> > DR0-DR3 when there is no associated active breakpoint.  Maybe it would
> > be best to keep the existing registers in the thread structure.
> > 
> 
> These are profound questions and I believe that it is upto the process in
> user-space to answer them.
> 
> What we could ensure from the kernel-space is to retain the
> existing behaviour of ptrace i.e. return success when a write is done on
> DR0-DR3 (almost unconditionally) and reserve error returns only when DR7
> is written into.
> 
> The patch in question could possibly return an -ENOMEM (even when write
> is done on DR0-DR3) but I will change the behaviour as stated above.
> 
> 
> A DR0 - DR3 return will do a:
> 	thread->debugreg[n] = val;
> 	return 0;
> 
> while all error returns are reserved for:
> 	rc = ptrace_write_dr7(tsk, val);

That does seem to be the most logical approach.  The problem with it is
that it doesn't give the caller much information about the cause of the
problem or how to fix it.  (Not that existing programs would know how
to interpret this information anyway...)

> > > +++ linux-2.6-tip/kernel/hw_breakpoint.c
> > > @@ -0,0 +1,367 @@
> > ...
> > > +struct task_struct *last_debugged_task;
> > 
> > Is this variable provided only for use by the hw_breakpoint_handler()
> > routine, for detecting lazy debug-register switching?  It won't work
> > right on SMP systems.  You need to use a per-CPU variable instead.
> > 
> 
> Thanks for pointing it out. Here's what it will be made:
> DEFINE_PER_CPU(struct task_struct *, last_debugged_task);
> 
> That also re-introduces the put_cpu_no_sched() into
> switch_to_thread_hw_breakpoint() function.
> 
> void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
> {
> 	/* Set the debug register */
> 	arch_install_thread_hw_breakpoint(tsk);
> 	per_cpu(last_debugged_task, get_cpu()) = current;
> 	put_cpu_no_resched();
> }

With the corresponding change in hw_breakpoint_handler(), of course.

> > Even though "arch_install_none" was my own name, I don't like it very
> > much.  "arch_remove_user_hw_breakpoints" would be better.
> > 
> 
> How about arch_uninstall_thread_hw_breakpoint()? (Being the opposite
> of arch_install_thread_hw_breakpoint()).

Okay.

> > > +/*
> > > + * Erase all the hardware breakpoint info associated with a thread.
> > > + *
> > > + * If tsk != current then tsk must not be usable (for example, a
> > > + * child being cleaned up from a failed fork).
> > > + */
> > > +void flush_thread_hw_breakpoint(struct task_struct *tsk)
> > > +{
> > > +	int i;
> > > +	struct thread_struct *thread = &(tsk->thread);
> > > +
> > > +	mutex_lock(&hw_breakpoint_mutex);
> > > +
> > > +	/* The thread no longer has any breakpoints associated with it */
> > > +	clear_tsk_thread_flag(tsk, TIF_DEBUG);
> > > +	for (i = 0; i < HB_NUM; i++) {
> > > +		if (thread->hbp[i]) {
> > > +			hbp_user_refcount[i]--;
> > > +			kfree(thread->hbp[i]);
> > 
> > Ugh!  In general you shouldn't deallocate memory you didn't allocate
> > originally.  What will happen when there is a utrace interface in
> > addition to the ptrace interface?
> > 
> 
> I can't see how I can invoke ptrace related code here to free memory
> here, although I agree that __unregister_user_hw_breakpoint() code need not
> mess with it.
> I will retain the kfree() in flush_thread_hw_breakpoint(), but remove it move 
> it from the latter to ptrace related code.

Yes, that seems like the best compromise.  If utrace has a problem, we 
can fix it later.

> > > +/**
> > > + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> > > + * @bp: the breakpoint structure to unregister
> > > + *
> > > + * Uninstalls and unregisters @bp.
> > > + */
> > > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > > +{
> > > +	int i, j;
> > > +
> > > +	mutex_lock(&hw_breakpoint_mutex);
> > > +
> > > +	/* Find the 'bp' in our list of breakpoints for kernel */
> > > +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > > +		if (bp == hbp_kernel[i])
> > > +			break;
> > 
> > If you would store the register number in the arch-specific part of
> > struct hw_breakpoint then this loop wouldn't be needed.
> > 
> 
> It's just a tiny loop (theoretical max is HB_NUM). It could save a member in 
> 'struct hw_breakpoint' - a significant saving if there are
> multiple number of threads that use debug registers.
> 
> The code is already guilty of storing the address of the breakpoint
> twice i.e. once in thread->hbp->info.address and again in
> thread->debugreg[n]. Adding this would be another. What do you say?

I don't know.  You're right that the loop above is insignificant.  On 
the other hand, it's hard to imagine a very large number of threads all 
being debugged simultaneously (there would have to be many thousands of 
them before the overhead was noticeable).

> > > +
> > > +	/*
> > > +	 * Delete the last kernel breakpoint entry after compaction and update
> > > +	 * the pointer to the lowest numbered kernel entry after updating its
> > > +	 * corresponding value in kdr7
> > > +	 */
> > > +	hbp_kernel[hbp_kernel_pos] = 0;
> > > +	arch_unregister_kernel_hw_breakpoint();
> > 
> > Even though it was part of my original design, there's no good reason
> > for making arch_register_kernel_hw_breakpoint and
> > arch_unregister_kernel_hw_breakpoint be separate functions.  There
> > should just be a single function: arch_update_kernel_hw_breakpoints.
> > The same is true for arch_update_user_hw_breakpoints.  In each case,
> > all that is needed is to recalculate the DR7 mask and value.
> > 
> 
> This and a few other suggestions below can be taken only if we chose to
> update all kernel related breakpoint registers, irrespective of the
> change. In return for saving a few lines of code (+simpler code +
> increased readability) we should take some runtime overhead during
> (un)register_<> calls.

That's true.  On the other hand, I don't think people will be 
installing and removing kernel breakpoints very often.  Probably only 
while testing the kernel, not during normal operation.

(By the way, notice that the overhead occurs only during 
_registration_; during unregistration you loop over all the breakpoints 
anyway.  Also notice that the loop overhead is comparable to the amount 
of work done inside the loop, so the total runtime might not change 
much.)

> I'm not sure about the overhead of processing an IPI (which you've cited
> as being much larger than the actual code being executed), but a little
> reluctant to remove code that is tuned for more specific tasks. Consider
> a large system where the number of CPUs is huge (say three digits or so),
> and we want to install a breakpoint for the last register hb_num. It would
> invoke a  write on all hb_num-1 registers for 'n' CPUs. I'm not sure if it's
> worthwhile for saving a few lines of code.

If the breakpoint you want to install is the only kernel breakpoint 
then it would not involve an extra write to all HB_NUM - 1 registers.  
The code I proposed writes only to registers hbp_kernel_pos through 
HB_NUM - 1.

On the other hand, if you want to install multiple kernel breakpoints 
then it's true, my scheme would require some extra writes to the 
debug registers.  However each write boils down to something like three 
machine instructions (one to fetch the address of the hw_breakpoint 
structure, one to fetch the address of the breakpoint, and one to write 
the debug register).  If HB_NUM were larger than 16 or so I might worry 
about the extra work, but with at most four we can forget about it.

> I can change it if you insist. Let me know what you think.

I prefer the simpler interface.  Especially since it doesn't involve 
converting between int and void *.

> > > +	hbp_kernel_pos++;
> > 
> > And this should be moved up one line, so that the arch-specific code
> > knows how many kernel breakpoints to register.
> > 
> 
> This is done after invoking arch_unregister_kernel_hw_breakpoint() just
> so that the corresponding values in kdr7 are updated.

You have to recalculate kdr7 in any case, since some of the other 
breakpoints may have been compacted.

> > > +{
> > > +	int pos = *(int *)idx;
> > > +	unsigned long dr7;
> > > +	int i;
> > > +
> > > +	get_debugreg(dr7, 7);
> > > +
> > > +	/* Don't allow debug exceptions while we update the registers */
> > > +	set_debugreg(0UL, 7);
> > > +
> > > +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> > > +		if ((pos >= 0) && (i != pos))
> > > +			continue;
> > > +		dr7 &= ~(dr7_masks[i]);
> > > +		if (hbp_kernel[i])
> > > +			set_debugreg(hbp_kernel[i]->info.address, i);
> > > +	}
> > 
> > For example, this loop could be written more simply as follows:
> > 
> > 	switch (hbp_kernel_pos) {
> > 	case 0:
> > 		set_debugreg(hbp_kernel[0]->info.address, 0);
> > 	case 1:
> > 		set_debugreg(hbp_kernel[1]->info.address, 1);
> > 		...
> > 	}
> > 
> With above code
> i)it uses fewer lines of code

But those lines are more complicated, both in terms of what the CPU has 
to go through to execute them and what the reader has to go through to 
think about them.  Sheer number of lines isn't always the best metric.

>  ii)Although when coding is completely
> done, hbp_kernel[i] cannot be NULL here, we have a check for the same
> just in case. It helped me several times during the course of development
> to have the check as above and prevent crashes.

I'm not worried about that.  There should be only one place where these
variables are set, so it should be easy enough to make sure there
aren't any mistakes.  And if there are, crashing is a better way to
bring them to your attention than just skipping over the bad entries!  
:-)

> > > +/*
> > > + * Install the thread breakpoints in their debug registers.
> > > + */
> > > +void arch_install_thread_hw_breakpoint(struct task_struct *tsk)
> > > +{
> > > +	int i;
> > > +	struct thread_struct *thread = &(tsk->thread);
> > > +
> > > +	for (i = 0;  (i < hbp_kernel_pos) && hbp_user_refcount[i]; i++)
> > > +		if (thread->hbp[i])
> > > +			set_debugreg(thread->hbp[i]->info.address, i);
> > 
> > The loop condition is wrong.  But since this routine is on the hot
> > path we should avoid using a loop at all.  In fact, if the DR0-DR3
> > register values are added back into the thread structure, we could
> > simply do this:
> > 
> > 	switch (hbp_kernel_pos) {
> > 	case 4:
> > 		set_debugreg(thread->dr3, 3);
> > 	case 3:
> > 		set_debugreg(thread->dr2, 2);
> > 		...
> > 	}
> > 
> 
> The above loop will now become (after inclusion of debug registers in
> thread_struct), with fewer indirections.
> 
> 	for (i = 0;  i < hbp_kernel_pos; i++)
> 		if (thread->hbp[i])
> 			set_debugreg(thread->debugreg[i], i);
> 
> It is better because i)contains fewer lines of code compared to a switch case

No.  As I mentioned above, the number of lines of code isn't always the
best guide.  In this case we want fastest execution.  The tests and
conditional jumps in the loop slow it down considerably when compared
to the straight-through execution of the "switch".

> ii)doesn't write onto 'dont-care' debug registers

Since the registers are "don't-care", I don't care if they get written 
to!  :-)

> iii)If considered an
> overhead, the compiler can always unroll the loop for optimisation.
> 
> Will change if you insist.

This is one place where Roland insisted I change it, so I'm forwarding 
his insistence onto you.

> > > +/*
> > > + * Check for virtual address in kernel space.
> > > + */
> > > +int arch_check_va_in_kernelspace(unsigned long va, u8 hbp_len)
> > > +{
> > > +	unsigned int len;
> > > +
> > > +	len = get_hbp_len(hbp_len);
> > > +
> > > +	return ((va >= TASK_SIZE) && ((va + len) >= TASK_SIZE));
> > 
> > In theory this should be (va + len - 1).
> >
> 
> You mean check for?
> return ((va >= TASK_SIZE) && ((va + (len - 1)) >= TASK_SIZE));

Yes.  (Although I would write it without the extra parentheses, but 
that's just a matter of personal taste.)

> > > +/*
> > > + * Modify an existing user breakpoint structure.
> > > + */
> > > +int arch_modify_user_hw_breakpoint(int pos, struct hw_breakpoint *bp,
> > > +		struct task_struct *tsk)
> > > +{
> > > +	struct thread_struct *thread = &(tsk->thread);
> > > +
> > > +	/* Check if the register to be modified was enabled by the thread */
> > > +	if (!(thread->dr7 & (1 << (pos * DR_ENABLE_SIZE))))
> > > +		return -EINVAL;
> > > +
> > > +	thread->dr7 &= ~dr7_masks[pos];
> > > +	thread->dr7 |= encode_dr7(pos, bp->info.len, bp->info.type);
> > > +
> > > +	return 0;
> > > +}
> > 
> > It might be possible to eliminate this rather awkward code, once the
> > DR0-DR3 values are added back into the thread structure.
> > 
> 
> I'm sorry I don't see how. Can you explain?

This gets back to those tricky questions about integrating ptrace with
hw_breakpoint.  In theory we could avoid allocating hw_breakpoint
structures for ptrace breakpoints and treat them completely
independently, but overall it's probably better to do things uniformly.

Regardless, we are still left with the problem that it's not easy to
capture the ptrace interface using an hw_breakpoint structure, because
ptrace breakpoints are set up in two stages: one to save the address in
DRn (0 <= n <= 3) and one to save the type and length in DR7.  What's
the best way to handle it when task being debugged isn't running and
the debugger changes the breakpoint address?  Or changes the
length/type fields in DR7?  I wrote modify_user_hw_breakpoint() to
handle this, but it was just a kludge.

If we store debugreg[0..3] in the thread structure, and if 
__register_user_hw_breakpoint() is written properly, then maybe ptrace 
can install modifications to existing breakpoints simply by calling 
__register_user_hw_breakpoint() and re-using the old "pos" value.

> > > +/*
> > > + * Copy out the debug register information for a core dump.
> > > + *
> > > + * tsk must be equal to current.
> > > + */
> > > +void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
> > > +{
> > > +	struct thread_struct *thread = &(tsk->thread);
> > > +	int i;
> > > +
> > > +	memset(u_debugreg, 0, sizeof u_debugreg);
> > > +	for (i = 0; i < thread->hbp_num_installed && thread->hbp[i]; ++i)
> > > +		u_debugreg[i] = thread->hbp[i]->info.address;
> > 
> > The loop condition is wrong, since you don't compact userspace
> > breakpoints.  But it could be unrolled into:
> > 
> > 	u_debugreg[0] = thread->dr0;
> > 	...
> > 	u_debugreg[3] = thread->dr3;
> > 
> 
> I agree that some of the code in the patch were based on the assumption
> that the registers by user-space users would be consumed in an
> increasing fashion, but it should be changed.
> 
> The above code will become:
> 	for (i = 0; i < HB_NUM; ++i)
> 		if (thread->hbp[i])
> 			u_debugreg[i] = thread->debugreg[i];

Why bother to test?  Since we don't care what's in the registers when 
they aren't being used, just write to all of them.

Also, the upper limit of the loop should be hbp_kernel_pos, not HB_NUM.

> Also note that "unsinged long debugreg[HB_NUM]" is embedded in
> thread_struct and not as shown below for using them in loops
> conveniently.
> 
> unsigned long debugreg0;
> unsigned long debugreg1;
> ...

If anything, that's an argument for unrolling the loop by hand.  But it 
doesn't matter; you can always change the contents of the 
thread_struct.

> > > +
> > > +	if (dr6 & DR_STEP)
> > > +		return NOTIFY_DONE;
> > 
> > This test is wrong.  Why did you change it?  It should be:
> > 
> > 	if (!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> > 
> > In theory it's possible to have both the Single-Step bit and a Debug-Trap
> > bit set at the same time.
> > 
> 
> This code is in hw_breakpoint_handler() which we don't intend to enter 
> if single-stepping bit is set (through kprobes) and hence the
> NOTIFY_DONE.

I don't see why we shouldn't enter even in this case.  Suppose somebody
single-steps over an instruction that accesses a variable with an
associated data breakpoint?

> Given that HW_BREAKPOINT_EXECUTE is not
> allowed over kernel-space addresses, we cannot have a kprobe and a HW
> breakpoint over the same address causing simultaneous exceptions.

Kprobe would use INT 3, wouldn't it?  But that's a separate issue; an 
instruction breakpoint is different from a single-step exception.

> However when the patch once had support for Instructions breakpoints + 
> post_handler(), it was a different case then.
> 
> Is there a reason why you think this check and/or return condition
> should be different?

Yes, because there _can_ be simultaneous single-step and data 
breakpoint exceptions.

> The hw_breakpoint_handler() will be modified like this:
> (without the modifications to dr6). Note that the 'goto exit' has
> changed to 'continue' to allow handling of multiple exceptions.
> 
> int __kprobes hw_breakpoint_handler(struct die_args *args)
> {
> 	int i, rc = NOTIFY_DONE;
> 	struct hw_breakpoint *bp;
> 	/* The DR6 value is stored in args->err */
> 	unsigned long dr7, dr6 = args->err;
> 
> 	if (dr6 & DR_STEP)
> 		return NOTIFY_DONE;
> 
> 	get_debugreg(dr7, 7);
> 
> 	/* Disable breakpoints during exception handling */
> 	set_debugreg(0UL, 7);
> 
> 	/*
> 	 * Assert that local interrupts are disabled
> 	 * Reset the DRn bits in the virtualized register value.
> 	 * The ptrace trigger routine will add in whatever is needed.
> 	 */
> 	current->thread.debugreg6 &= \
> 			~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);

The '\' character isn't needed.

> 
> 	/* Lazy debug register switching */
> 	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> 		switch_to_none_hw_breakpoint();
> 		put_cpu_no_resched();
> 	}

I just noticed that the lines saving DR7 and setting it to 0 need to
come here.  Otherwise switch_to_none_hw_breakpoint() might set DR7 back
to a nonzero value, and it might not match the value stored in dr7.

> 
> 	/* Handle all the breakpoints that were triggered */
> 	for (i = 0; i < HB_NUM; ++i) {
> 		if (likely(!(dr6 & (DR_TRAP0 << i))))
> 			continue;
> 		/*
> 		 * Find the corresponding hw_breakpoint structure and
> 		 * invoke its triggered callback.
> 		 */
> 		if (i >= hbp_kernel_pos)
> 			bp = hbp_kernel[i];
> 		else {
> 			bp = current->thread.hbp[i];
> 			if (!bp) {
> 				/* False alarm due to lazy DR switching */
> 				continue;
> 			}
> 		}
> 
> 		switch (bp->info.type) {
> 		case HW_BREAKPOINT_WRITE:
> 		case HW_BREAKPOINT_RW:
> 			(bp->triggered)(bp, args->regs);
> 
> 			if (arch_check_va_in_userspace(bp->info.address,
> 							bp->info.len))
> 				rc = NOTIFY_DONE;
> 			else
> 				rc = NOTIFY_STOP;;
> 			continue;
> 		case HW_BREAKPOINT_EXECUTE:
> 			/*
> 			 * Presently we allow instruction breakpoints
> 			 * only in
> 			 * user-space when requested through ptrace.
> 			 */
> 			if (arch_check_va_in_userspace(bp->info.address,
> 							bp->info.len)) {
> 				(bp->triggered)(bp, args->regs);
> 				/*
> 				 * do_debug will notify user through a
> 				 * SIGTRAP
> 				 * signal. So we are not requesting a
> 				 * NOTIFY_STOP here
> 				 */
> 				rc = NOTIFY_DONE;
> 				continue;
> 			}
> 		}
> 	}

I don't understand why you are setting rc above.  The value returned by
this function should not depend on what breakpoints were hit; it should
depend only on whether there is still more work for the notifier chain
to do.

I also don't understand why you need to check for instruction
breakpoints occurring in kernel code.  We already know they can't
happen because the registration routines won't allow them.  
Double-checking isn't necessary.  All breakpoints should be treated
exactly the same: Invoke the "triggered" callback.  Nothing more.

Besides, putting a check in here means there's one more opportunity for 
mistakes when you _do_ decide to allow instruction breakpoints in the 
kernel.

> 
> 	set_debugreg(dr7, 7);
> 	return rc;
> }

> > > @@ -530,13 +530,14 @@ asmlinkage __kprobes struct pt_regs *syn
> > >  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> > >  {
> > >  	struct task_struct *tsk = current;
> > > -	unsigned long condition;
> > > +	unsigned long dr6;
> > >  	int si_code;
> > >  
> > > -	get_debugreg(condition, 6);
> > > +	get_debugreg(dr6, 6);
> > > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> > >  
> > >  	/* Catch kmemcheck conditions first of all! */
> > > -	if (condition & DR_STEP && kmemcheck_trap(regs))
> > > +	if (dr6 & DR_STEP && kmemcheck_trap(regs))
> > >  		return;
> > 
> > Are you sure this is right?  Is it possible for any of the DR_TRAPn bits
> > to be set as well when this happens?
> > 
> > 
> 
> I did not look at this check before. But the (dr6 & DR_STEP) condition
> should make sure no HW breakpoint exceptions are set (since we don't
> allow instruction breakpoints in kernel-space yet, as explained above).

What does kmemcheck_trap() do?

> Will read like this:
> 	/*
> 	 * There's a problem with moving the
> 	 * switch_to_thread_hw_breakpoint()
> 	 * call before current is updated.  Suppose a kernel breakpoint
> 	 * is
> 	 * triggered in between the two.  The hw-breakpoint handler will
> 	 * see
> 	 * that current is different from the task pointer stored in
> 	 * last_debugged_task, so it will think the task pointer is
> 	 * leftover
> 	 * from an old task (lazy switching) and will erase it. Then
> 	 * until the
> 	 * next context switch, no user-breakpoints will be installed.
> 	 *
> 	 * The real problem is that it's impossible to update both
> 	 * current and
> 	 * last_debugged_task at the same instant, so there will always
> 	 * be a
> 	 * window in which they disagree and a breakpoint might get
> 	 * triggered.
> 	 * Since we use lazy switching, we are forced to assume that a
> 	 * disagreement means that current is correct and
> 	 * last_debugged_task is
> 	 * old.  But if you move the code above then you'll create a
> 	 * window in
> 	 * which current is old and last_debugged_task is correct.
> 	 */

With the line breaks fixed up, please.

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-03-28  8:46   ` K.Prasad
@ 2009-04-01 16:22     ` Alan Stern
  2009-04-07  8:22       ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-04-01 16:22 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Sat, 28 Mar 2009, K.Prasad wrote:

> On Wed, Mar 25, 2009 at 03:48:31PM -0400, Alan Stern wrote:
> > > +
> > > +/*
> > > + * Handle debug exception notifications.
> > > + */
> > > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > > +{
> > > +	int i, rc = NOTIFY_DONE;
> > > +	struct hw_breakpoint *bp;
> > > +	/* The DR6 value is stored in args->err */
> > > +	unsigned long dr7, dr6 = args->err;
> > 
> > Please change this.  (I should have changed it long ago, but I never
> > got around to it.)  Instead of passing the DR6 value in args->err,
> > pass a pointer to the dr6 variable in do_debug().  That way the
> > handler routines can turn off bits in that variable and do_debug() can
> > see which bits remain set at the end.
> > 
> > Of course, this will require a corresponding change to the
> > post_kprobe_handler() routine.
> >
> 
> As I looked at the code with an intention of changing it, I don't find a
> place - in hw_breakpoint_handler() and in functions called by
> kprobe_exceptions_notify() where bits in dr6 are written into.

That's because such writes wouldn't do any good in the old code!  :-)

> The thread-specific thread->debugreg6 is updated with causative bits in
> ptrace_triggered() to help send signals to the user-space. I don't see a
> user for the change you propose.

For each breakpoint where we decide it's a case of lazy DR switching or
we invoke a "triggered" callback, the corresponding bit in dr6 should
be cleared.  This is a way of indicating to do_debug() that the handler 
has taken care of these causes of the exception.

Similarly, the kprobe routine should clear the single-step bit in dr6 
when it handles a single-step exception.  When the notifier chain 
completes, the only bits remaining in dr6 should be for events that 
still need to be handled.

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-01 16:16     ` Alan Stern
@ 2009-04-07  8:22       ` K.Prasad
  2009-04-09 20:50         ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-04-07  8:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Wed, Apr 01, 2009 at 12:16:26PM -0400, Alan Stern wrote:
> Sorry for not replying sooner; I was away on a short vacation.
> 
> On Sat, 28 Mar 2009, K.Prasad wrote:
> 
> > > There are some serious issues involving userspace breakpoints and the
> > > legacy ptrace interface.  It all comes down to this: At what point
> > > is a breakpoint registered for a ptrace caller?
> > > 
> > > Remember, to set up a breakpoint a debugger needs to call ptrace
> > > twice: once to put the address in one of the DR0-DR3 registers and
> > > once to set up DR7.  So when does the task own the breakpoint?
> > > 
> > > Logically, we should wait until DR7 gets set, because until then the
> > > breakpoint is not active.  But then how do we let the caller know that
> > > one of his breakpoints conflicts with a kernel breakpoint?
> > > 
> > > If we report an error during an attempt to set DR0-DR3 then at least
> > > it's unambiguous.  But then how do we know when the task is _finished_
> > > using the breakpoint?  It's under no obligation to set the register
> > > back to 0.
> > > 
> > > Related to this is the question of how to store the task's versions of
> > > DR0-DR3 when there is no associated active breakpoint.  Maybe it would
> > > be best to keep the existing registers in the thread structure.
> > > 
> > 
> > These are profound questions and I believe that it is upto the process in
> > user-space to answer them.
> > 
> > What we could ensure from the kernel-space is to retain the
> > existing behaviour of ptrace i.e. return success when a write is done on
> > DR0-DR3 (almost unconditionally) and reserve error returns only when DR7
> > is written into.
> > 
> > The patch in question could possibly return an -ENOMEM (even when write
> > is done on DR0-DR3) but I will change the behaviour as stated above.
> > 
> > 
> > A DR0 - DR3 return will do a:
> > 	thread->debugreg[n] = val;
> > 	return 0;
> > 
> > while all error returns are reserved for:
> > 	rc = ptrace_write_dr7(tsk, val);
> 
> That does seem to be the most logical approach.  The problem with it is
> that it doesn't give the caller much information about the cause of the
> problem or how to fix it.  (Not that existing programs would know how
> to interpret this information anyway...)
> 

A slight change though...writes to DR0-DR3 may fail if the address is
invalid. This behaviour is true even in existing implementation of
ptrace_set_debugreg().

I am removing some of the comments below, which I have addressed in the
patch. Like using switch-case and consolidating updation of kernel
breakpoints into one function: arch_update_kernel_hw_breakpoints(), etc.

> > > > +int arch_modify_user_hw_breakpoint(int pos, struct hw_breakpoint *bp,
> > > > +		struct task_struct *tsk)
> > > > +{
> > > > +	struct thread_struct *thread = &(tsk->thread);
> > > > +
> > > > +	/* Check if the register to be modified was enabled by the thread */
> > > > +	if (!(thread->dr7 & (1 << (pos * DR_ENABLE_SIZE))))
> > > > +		return -EINVAL;
> > > > +
> > > > +	thread->dr7 &= ~dr7_masks[pos];
> > > > +	thread->dr7 |= encode_dr7(pos, bp->info.len, bp->info.type);
> > > > +
> > > > +	return 0;
> > > > +}
> > > 
> > > It might be possible to eliminate this rather awkward code, once the
> > > DR0-DR3 values are added back into the thread structure.
> > > 
> > 
> > I'm sorry I don't see how. Can you explain?
> 
> This gets back to those tricky questions about integrating ptrace with
> hw_breakpoint.  In theory we could avoid allocating hw_breakpoint
> structures for ptrace breakpoints and treat them completely
> independently, but overall it's probably better to do things uniformly.
> 
> Regardless, we are still left with the problem that it's not easy to
> capture the ptrace interface using an hw_breakpoint structure, because
> ptrace breakpoints are set up in two stages: one to save the address in
> DRn (0 <= n <= 3) and one to save the type and length in DR7.  What's
> the best way to handle it when task being debugged isn't running and
> the debugger changes the breakpoint address?  Or changes the
> length/type fields in DR7?  I wrote modify_user_hw_breakpoint() to
> handle this, but it was just a kludge.
> 
> If we store debugreg[0..3] in the thread structure, and if 
> __register_user_hw_breakpoint() is written properly, then maybe ptrace 
> can install modifications to existing breakpoints simply by calling 
> __register_user_hw_breakpoint() and re-using the old "pos" value.
> 

I've re-written __modify_user_hw_breakpoint() to invoke the common
(new) worker routine arch_update_user_hw_breakpoint(). Let me know if
you think the new code still needs re-work.

> > 
> > This code is in hw_breakpoint_handler() which we don't intend to enter 
> > if single-stepping bit is set (through kprobes) and hence the
> > NOTIFY_DONE.
> 
> I don't see why we shouldn't enter even in this case.  Suppose somebody
> single-steps over an instruction that accesses a variable with an
> associated data breakpoint?
>

I think this is an important case that I missed, which made me add the
early return in hw_breakpoint_handler() for (DR6 & DR_STEP). I have
changed hw_breakpoint_handler() code to address all of your comments
except making dr6 a pointer. I will talk about my concerns about it 
in the subsequent mail.
 
> > 
> > 	/* Lazy debug register switching */
> > 	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> > 		switch_to_none_hw_breakpoint();
> > 		put_cpu_no_resched();
> > 	}
> 
> I just noticed that the lines saving DR7 and setting it to 0 need to
> come here.  Otherwise switch_to_none_hw_breakpoint() might set DR7 back
> to a nonzero value, and it might not match the value stored in dr7.
> 

arch_uninstall_thread_hw_breakpoint()<--switch_to_none_hw_breakpoint()
will store 'kdr7' (which contains all kernel-space breakpoints in
encoded format) to DR7 physical register. Given that the current()
process should not have TIF_DEBUG() set (if it were set,
switch_to_thread_hw_breakpoint() would have been invoked to set
last_debugged_task), we will wipe out all user-space breakpoints and
store only kdr7.

> > > > @@ -530,13 +530,14 @@ asmlinkage __kprobes struct pt_regs *syn
> > > >  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> > > >  {
> > > >  	struct task_struct *tsk = current;
> > > > -	unsigned long condition;
> > > > +	unsigned long dr6;
> > > >  	int si_code;
> > > >  
> > > > -	get_debugreg(condition, 6);
> > > > +	get_debugreg(dr6, 6);
> > > > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> > > >  
> > > >  	/* Catch kmemcheck conditions first of all! */
> > > > -	if (condition & DR_STEP && kmemcheck_trap(regs))
> > > > +	if (dr6 & DR_STEP && kmemcheck_trap(regs))
> > > >  		return;
> > > 
> > > Are you sure this is right?  Is it possible for any of the DR_TRAPn bits
> > > to be set as well when this happens?
> > > 
> > > 
> > 
> > I did not look at this check before. But the (dr6 & DR_STEP) condition
> > should make sure no HW breakpoint exceptions are set (since we don't
> > allow instruction breakpoints in kernel-space yet, as explained above).
> 
> What does kmemcheck_trap() do?
>

As I said before, the fact that I ignored a case where we could
single-step an instruction accessing a data being monitored by a
breakpoint, made me ignore all (dr6 & DR_STEP) cases.

kmemcheck_trap()'s functionality can be nicely understood from the
"Technical description" section in Documentation/kmemcheck.txt.
Basically it uses single-stepping to 'hide' a page immediately after
the instruction, say i, using the page has finished execution. This
is to cause page-fault deliberately, which is used by kmemcheck.

However, as you rightly pointed, the next instruction (i + 1) could be
accessing a data monitored by the debug registers and TRAP<n> bits could
be set. We shouldn't allow return in such a case. I will modify the code
in do_debug() also.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-01 16:22     ` Alan Stern
@ 2009-04-07  8:22       ` K.Prasad
  0 siblings, 0 replies; 27+ messages in thread
From: K.Prasad @ 2009-04-07  8:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Wed, Apr 01, 2009 at 12:22:08PM -0400, Alan Stern wrote:
> On Sat, 28 Mar 2009, K.Prasad wrote:
> 
> > On Wed, Mar 25, 2009 at 03:48:31PM -0400, Alan Stern wrote:
> > > > +
> > > > +/*
> > > > + * Handle debug exception notifications.
> > > > + */
> > > > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > > > +{
> > > > +	int i, rc = NOTIFY_DONE;
> > > > +	struct hw_breakpoint *bp;
> > > > +	/* The DR6 value is stored in args->err */
> > > > +	unsigned long dr7, dr6 = args->err;
> > > 
> > > Please change this.  (I should have changed it long ago, but I never
> > > got around to it.)  Instead of passing the DR6 value in args->err,
> > > pass a pointer to the dr6 variable in do_debug().  That way the
> > > handler routines can turn off bits in that variable and do_debug() can
> > > see which bits remain set at the end.
> > > 
> > > Of course, this will require a corresponding change to the
> > > post_kprobe_handler() routine.
> > >
> > 
> > As I looked at the code with an intention of changing it, I don't find a
> > place - in hw_breakpoint_handler() and in functions called by
> > kprobe_exceptions_notify() where bits in dr6 are written into.
> 
> That's because such writes wouldn't do any good in the old code!  :-)
> 
> > The thread-specific thread->debugreg6 is updated with causative bits in
> > ptrace_triggered() to help send signals to the user-space. I don't see a
> > user for the change you propose.
> 
> For each breakpoint where we decide it's a case of lazy DR switching or
> we invoke a "triggered" callback, the corresponding bit in dr6 should
> be cleared.  This is a way of indicating to do_debug() that the handler 
> has taken care of these causes of the exception.
> 
> Similarly, the kprobe routine should clear the single-step bit in dr6 
> when it handles a single-step exception.  When the notifier chain 
> completes, the only bits remaining in dr6 should be for events that 
> still need to be handled.
> 
> Alan Stern
>

This does sound like good design, but unfortunately there are pieces in
do_debug() which rely upon bits in dr6 being set even after the actual
breakpoint is handled (the get_si_code() is one such example).

Do we go about changing them to use thread->debugreg6 instead of dr6? If
yes, wouldn't that be better done outside the HW Breakpoint patches as a
part of some cleanup initiative?

Thanks,
K.Prasad


 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-07  8:22       ` K.Prasad
@ 2009-04-09 20:50         ` Alan Stern
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Stern @ 2009-04-09 20:50 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Tue, 7 Apr 2009, K.Prasad wrote:

> A slight change though...writes to DR0-DR3 may fail if the address is
> invalid. This behaviour is true even in existing implementation of
> ptrace_set_debugreg().

That's okay.

> > > 
> > > 	/* Lazy debug register switching */
> > > 	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> > > 		switch_to_none_hw_breakpoint();
> > > 		put_cpu_no_resched();
> > > 	}
> > 
> > I just noticed that the lines saving DR7 and setting it to 0 need to
> > come here.  Otherwise switch_to_none_hw_breakpoint() might set DR7 back
> > to a nonzero value, and it might not match the value stored in dr7.
> > 
> 
> arch_uninstall_thread_hw_breakpoint()<--switch_to_none_hw_breakpoint()
> will store 'kdr7' (which contains all kernel-space breakpoints in
> encoded format) to DR7 physical register. Given that the current()
> process should not have TIF_DEBUG() set (if it were set,
> switch_to_thread_hw_breakpoint() would have been invoked to set
> last_debugged_task), we will wipe out all user-space breakpoints and
> store only kdr7.

No, you don't understand.  The code looks like this:

> +	get_debugreg(dr7, 7);
> +
> +	/* Disable breakpoints during exception handling */
> +	set_debugreg(0UL, 7);
...
> +	/* Lazy debug register switching */
> +	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> +		switch_to_none_hw_breakpoint();
> +		put_cpu_no_resched();
> +	}
...
> +	set_debugreg(dr7, 7);
> +	return rc;

The first few lines will set dr7 to a value which includes the user
breakpoints and will set DR7 to 0.  The next few lines will set DR7 to
kdr7, which might be non-zero.  This is wrong; we need DR7 to be 0.  
Then the second-to-last line will set DR7 back to dr7, which is also
wrong -- it should be set to kdr7.

> > For each breakpoint where we decide it's a case of lazy DR switching or
> > we invoke a "triggered" callback, the corresponding bit in dr6 should
> > be cleared.  This is a way of indicating to do_debug() that the handler 
> > has taken care of these causes of the exception.
> > 
> > Similarly, the kprobe routine should clear the single-step bit in dr6 
> > when it handles a single-step exception.  When the notifier chain 
> > completes, the only bits remaining in dr6 should be for events that 
> > still need to be handled.
> > 
> > Alan Stern
> >
> 
> This does sound like good design, but unfortunately there are pieces in
> do_debug() which rely upon bits in dr6 being set even after the actual
> breakpoint is handled (the get_si_code() is one such example).

If necessary, do_debug() can keep two copies of dr6: the original 
version read from DR6 and the version modified by the notification 
handlers.

> Do we go about changing them to use thread->debugreg6 instead of dr6? If
> yes, wouldn't that be better done outside the HW Breakpoint patches as a
> part of some cleanup initiative?

Should they use thread->debugreg6?  If they should, then change them.  
And no, the change should not be in a separate patch; it should be part 
of your series.  Otherwise there would be intermediate kernels that 
behaved incorrectly.

Alan Stern




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-24 15:57           ` K.Prasad
@ 2009-04-24 16:16             ` Alan Stern
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Stern @ 2009-04-24 16:16 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Fri, 24 Apr 2009, K.Prasad wrote:

> Do you think that the patchset is now in a form that can be submitted
> for upstream (-tip tree) acceptance?

I'll let you know after I've had a chance to go over it thoroughly.

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-24 14:16         ` Alan Stern
@ 2009-04-24 15:57           ` K.Prasad
  2009-04-24 16:16             ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-04-24 15:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Fri, Apr 24, 2009 at 10:16:07AM -0400, Alan Stern wrote:
> On Fri, 24 Apr 2009, K.Prasad wrote:
> 
> > The arch_update_kernel_hw_breakpoints() was designed to work like this -
> > it updates all registers beginning 'hbp_kernel_pos' to (HB_NUM - 1) with
> > the values stored in hbp_kernel[] array.
> > 
> > When inserting a new breakpoint, hbp_kernel_pos is decremented *before*
> > invoking arch_update_kernel_hw_breakpoints() so that the new value is
> > also written onto the physical debug register.
> > 
> > On removal, 'hbp_kernel_pos' is incremented *after*
> > arch_update_kernel_hw_breakpoints() so that the physical debug registers
> > i.e. both DR7 and DR<pos> are updated with the changes post removal and
> > compaction. I'm ready to make changes but don't see where the code
> > actually goes wrong. Can you explain that?
> 
> I'm sorry; I misread the code in arch_update_kernel_hw_breakpoints().  
> It isn't actually wrong, and you are correct to increment
> hbp_kernel_pos where you do.  Your code is different from my original
> version, which would update all the debug registers at once instead of
> doing the kernel and userspace breakpoints separately -- that's what 
> confused me.
> 
> There is one change you could make to improve the routine, however.  In 
> arch_update_kernel_hw_breakpoints(), the line
> 
> 	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> 
> really should be
> 
> 	kdr7 = 0;
> 
> since kdr7 never contains anything other than kernel breakpoint
> settings.  (You could update the comment in the preceding line as
> well.)
> 
> Alan Stern

Sure, I'd make that change (in the subsequent iteration) - it's much simpler.

Do you think that the patchset is now in a form that can be submitted
for upstream (-tip tree) acceptance?

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-24  5:56       ` K.Prasad
@ 2009-04-24 14:16         ` Alan Stern
  2009-04-24 15:57           ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-04-24 14:16 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Fri, 24 Apr 2009, K.Prasad wrote:

> The arch_update_kernel_hw_breakpoints() was designed to work like this -
> it updates all registers beginning 'hbp_kernel_pos' to (HB_NUM - 1) with
> the values stored in hbp_kernel[] array.
> 
> When inserting a new breakpoint, hbp_kernel_pos is decremented *before*
> invoking arch_update_kernel_hw_breakpoints() so that the new value is
> also written onto the physical debug register.
> 
> On removal, 'hbp_kernel_pos' is incremented *after*
> arch_update_kernel_hw_breakpoints() so that the physical debug registers
> i.e. both DR7 and DR<pos> are updated with the changes post removal and
> compaction. I'm ready to make changes but don't see where the code
> actually goes wrong. Can you explain that?

I'm sorry; I misread the code in arch_update_kernel_hw_breakpoints().  
It isn't actually wrong, and you are correct to increment
hbp_kernel_pos where you do.  Your code is different from my original
version, which would update all the debug registers at once instead of
doing the kernel and userspace breakpoints separately -- that's what 
confused me.

There is one change you could make to improve the routine, however.  In 
arch_update_kernel_hw_breakpoints(), the line

	kdr7 &= ~kdr7_masks[hbp_kernel_pos];

really should be

	kdr7 = 0;

since kdr7 never contains anything other than kernel breakpoint
settings.  (You could update the comment in the preceding line as
well.)

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-17 14:37     ` Alan Stern
@ 2009-04-24  5:56       ` K.Prasad
  2009-04-24 14:16         ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-04-24  5:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Fri, Apr 17, 2009 at 10:37:13AM -0400, Alan Stern wrote:
> On Fri, 17 Apr 2009, K.Prasad wrote:
> 

Hi Alan,
	I have modified the patch by accepting all of your suggestions
excepting one - which I try to explain below.

I will send the patchset containing changes suggested by you and a few
others (have explained that in the changelog).

Kindly review the new code.


> > 
> > > > +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> > > > +	hbp_kernel_pos++;
> > > 
> > > Don't you want to increment hbp_kernel_pos _before_ calling
> > > on_each_cpu()?  Otherwise you're telling the other CPUs to install an
> > > empty breakpoint.
> > > 
> > 
> > Incrementing hbp_kernel_pos after arch_update_kernel_hw_breakpoints()
> > during unregister_kernel_ is of the following significance:
> > 
> > - The 'pos' numbered debug register is reset to 0 through the code 
> >   "set_debugreg(hbp_kernel[i]->info.address, i)" where 'i' ranges from
> >   hbp_kernel_pos to HB_NUM. This is necessary during unregister operation.
> 
> It _isn't_ necessary.  The contents of that debug register don't matter 
> at all if it isn't enabled.  (And there's nothing special about 0; if 
> the breakpoint were enabled then you would get a breakpoint interrupt 
> whenever something tried to access address 0.)
> 
> Besides, the "set_debugreg" line of code in
> arch_update_kernel_hw_breakpoints doesn't get executed when
> hbp_kernel[i] is NULL.  So this whole point is moot; the debug register
> wouldn't be affected anyway.
> 
> > - The following statements would reset the bits corresponding to
> >   unregistered breakpoint only then.
> > 	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> > 	dr7 &= ~kdr7_masks[hbp_kernel_pos];
> 
> But if hbp_kernel_pos is wrong (i.e., if it hasn't been incremented)
> then these statements will set kdr7 and dr7 incorrectly.
> 
> > - Helps keep the arch_update_kernel_hw_breakpoints() code generic enough
> >   to be invoked during register and unregister operations.
> 
> This is another moot point.  Your code is _wrong_ -- how generic it is
> doesn't matter.
> 
>

The arch_update_kernel_hw_breakpoints() was designed to work like this -
it updates all registers beginning 'hbp_kernel_pos' to (HB_NUM - 1) with
the values stored in hbp_kernel[] array.

When inserting a new breakpoint, hbp_kernel_pos is decremented *before*
invoking arch_update_kernel_hw_breakpoints() so that the new value is
also written onto the physical debug register.

On removal, 'hbp_kernel_pos' is incremented *after*
arch_update_kernel_hw_breakpoints() so that the physical debug registers
i.e. both DR7 and DR<pos> are updated with the changes post removal and
compaction. I'm ready to make changes but don't see where the code
actually goes wrong. Can you explain that?

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-17  3:12   ` K.Prasad
@ 2009-04-17 14:37     ` Alan Stern
  2009-04-24  5:56       ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-04-17 14:37 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Fri, 17 Apr 2009, K.Prasad wrote:

> > > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > > +{
> > > +	int i, j;
> > > +
> > > +	mutex_lock(&hw_breakpoint_mutex);
> > > +
> > > +	/* Find the 'bp' in our list of breakpoints for kernel */
> > > +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > > +		if (bp == hbp_kernel[i])
> > > +			break;
> > > +
> > > +	/* Check if we did not find a match for 'bp'. If so return early */
> > > +	if (i == HB_NUM) {
> > > +		mutex_unlock(&hw_breakpoint_mutex);
> > > +		return;
> > > +	}
> > > +
> > > +	/*
> > > +	 * We'll shift the breakpoints one-level above to compact if
> > > +	 * unregistration creates a hole
> > > +	 */
> > > +	for (j = i; j > hbp_kernel_pos; j--)
> > > +		hbp_kernel[j] = hbp_kernel[j-1];
> > > +
> > > +	hbp_kernel[hbp_kernel_pos] = 0;
> > 
> > s/0/NULL/
> > 
> 
> Ok.
> 
> > > +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> > > +	hbp_kernel_pos++;
> > 
> > Don't you want to increment hbp_kernel_pos _before_ calling
> > on_each_cpu()?  Otherwise you're telling the other CPUs to install an
> > empty breakpoint.
> > 
> 
> Incrementing hbp_kernel_pos after arch_update_kernel_hw_breakpoints()
> during unregister_kernel_ is of the following significance:
> 
> - The 'pos' numbered debug register is reset to 0 through the code 
>   "set_debugreg(hbp_kernel[i]->info.address, i)" where 'i' ranges from
>   hbp_kernel_pos to HB_NUM. This is necessary during unregister operation.

It _isn't_ necessary.  The contents of that debug register don't matter 
at all if it isn't enabled.  (And there's nothing special about 0; if 
the breakpoint were enabled then you would get a breakpoint interrupt 
whenever something tried to access address 0.)

Besides, the "set_debugreg" line of code in
arch_update_kernel_hw_breakpoints doesn't get executed when
hbp_kernel[i] is NULL.  So this whole point is moot; the debug register
wouldn't be affected anyway.

> - The following statements would reset the bits corresponding to
>   unregistered breakpoint only then.
> 	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> 	dr7 &= ~kdr7_masks[hbp_kernel_pos];

But if hbp_kernel_pos is wrong (i.e., if it hasn't been incremented)
then these statements will set kdr7 and dr7 incorrectly.

> - Helps keep the arch_update_kernel_hw_breakpoints() code generic enough
>   to be invoked during register and unregister operations.

This is another moot point.  Your code is _wrong_ -- how generic it is
doesn't matter.


> > > +/* Unmasked kernel DR7 value */
> > > +static unsigned long kdr7;
> > > +static const unsigned long	kdr7_masks[HB_NUM] = {
> > > +	0xffff00ff,	/* Same for 3, 2, 1, 0 */
> > > +	0xfff000fc,	/* Same for 3, 2, 1 */
> > > +	0xff0000f0,	/* Same for 3, 2 */
> > > +	0xf00f00c0	/* LEN3, R/W3, G3, L3 */
> > > +};
> > 
> > It looks like you took out the first line of assignments.  Isn't
> > kdr7_masks supposed to contain HB_NUM + 1 elements?  Not to mention
> > that reordering the lines has mixed up the comments.
> > 
> 
> The kdr7_masks array is slightly different from your implementation
> earlier.
> 
> The element zero in the array is 0xffff00ff, which has all bits
> corresponding to DRs 0, 1, 2 and 3 'set', and hence the comment.

The comment is "Same for 3, 2, 1, 0".  But that is an incomplete 
thought -- same as _what_?  The answer is: Same as "LEN3, R/W3, G3, 
L3".  But you have messed up the logical order of the comments by 
reversing the lines.

Look at this:

	/* LEN3, R/W3, G3, L3 */
	/* Same for 3, 2 */
	/* Same for 3, 2, 1 */
	/* Same for 3, 2, 1, 0 */

That makes sense.  For instance, it's clear that the second line means 
LEN3, LEN2, R/W3, R/W2, G3, G2, L3, L2.  Now look at yours:

	/* Same for 3, 2, 1, 0 */
	/* Same for 3, 2, 1 */
	/* Same for 3, 2 */
	/* LEN3, R/W3, G3, L3 */

That doesn't make sense.  It's a general fact of human language: If you 
reverse the order of a bunch of sentences, you will mess up the 
meaning.


> > > +void arch_update_kernel_hw_breakpoints(void *unused)
> > > +{
> > > +	struct hw_breakpoint *bp;
> > > +	unsigned long dr7;
> > > +	int i;
> > > +
> > > +	/* Check if there is nothing to update */
> > > +	if (hbp_kernel_pos == HB_NUM)
> > > +		return;
> > 
> > This is wrong; you have to set DR7 to 0 first.  Might as well leave
> > out this test and just go through the whole routine.  The "for" loop
> > will be skipped entirely if hbp_kernel_pos == HB_NUM, so you don't
> > save much by returning early.
> > 
> > Ah, now I see this is why you removed a line from kdr7_masks.
> > 
> 
> This check is added to help the load_debug_registers() call during
> boot-time which is when hbp_kernel_pos = HB_NUM. If we don't return
> early, then assignments such as  these "kdr7 &= ~kdr7_masks[hbp_kernel_pos];"
> will cause array-out-of-bounds. Otherwise we have to circumvent it by
> adding a dummy HB_NUM element in kdr7_masks[] array. I would prefer
> retaining the "if (hbp_kernel_pos == HB_NUM)" check instead.

It's up to you.  But consider this:

     1. You have replaced a single array element (4 bytes of data)
	with a conditional test and jump (probably around 7 bytes
	of code, not to mention that the conditional jump will slow
	down the CPU even when it isn't taken).

     2. If hbp_kernel_pos is equal to HB_NUM then the early return 
	causes you to skip the line setting kdr7.  It will end up 
	having an incorrect value.


> > > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > > +{
> > > +	int i, rc = NOTIFY_STOP;
> > > +	struct hw_breakpoint *bp;
> > > +	/* The DR6 value is stored in args->err */
> > > +	unsigned long dr7, dr6 = args->err;
> > > +
> > > +	/*
> > > +	 * We are here because BT, BS or BD bit is set and no trap bits are set
> > > +	 * in dr6 register. Do an early return
> > > +	 */
> > 
> > I don't like such comments.  It says that BT, BS, or BD is set, but
> > they don't necessarily have to be.  You should say:
> > 
> > 	/* Do an early return if no trap bits are set in DR6 */
> > 
> 
> Since we enter hw_breakpoint_handler() code only if DIE_DEBUG (and not
> say DIE_INT3), one or more of these bits would be 'set' when we are in
> hw_breakpoint_handler() - BS, BT, BD or Trap bits, and hence the comment.

But the comment is _wrong_.  We might be here because one of the trap 
bits is set and BT, BS, and BD are clear, whereas the comment says that 
can't be the case.

Just read the comment again.  It says: "We are here because ... no trap 
bits are set".


> > >  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> > >  {
> > >  	struct task_struct *tsk = current;
> > > -	unsigned long condition;
> > > +	unsigned long dr6;
> > >  	int si_code;
> > >  
> > > -	get_debugreg(condition, 6);
> > > +	get_debugreg(dr6, 6);
> > > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> > >  
> > >  	/* Catch kmemcheck conditions first of all! */
> > > -	if (condition & DR_STEP && kmemcheck_trap(regs))
> > > +	if ((dr6 & DR_STEP && kmemcheck_trap(regs)) &&
> > > +		!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> > >  		return;
> > 
> > Should this test go before the "set_debugreg(0, 6)" statement?
> > 
> 
> Either way, it doesn't affect the function "kmemcheck_trap()". I don't
> find it reading directly from DR6 anywhere.

Sure, it doesn't affect kmemcheck_trap().  But it might affect
something else.  What if some other code later on looks at the contents
of DR6?

Incidentally, the parenthese in that test are wrong.  It should be:

	if ((dr6 & DR_STEP) && kmemcheck_trap(regs) &&
			!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))

> On a related node, a patch that modifies dr6 to be passed by reference
> to the notifier calls will be included in the next iteration.
> 
> Let me know, upon receiving the patch, if you still think that such a
> change i.e. resetting bits in dr6 after handling of the corresponding 
> breakpoint needs to be done.

At the moment, the only other bits which could be set are BT and BD.  
Presumably we don't care about BD.  Do you care if BT gets wiped out 
before anybody has a chance to see it?

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-16 21:19 ` Alan Stern
@ 2009-04-17  3:12   ` K.Prasad
  2009-04-17 14:37     ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-04-17  3:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Thu, Apr 16, 2009 at 05:19:54PM -0400, Alan Stern wrote:
> On Tue, 7 Apr 2009, K.Prasad wrote:
> 
> > Hi Alan,
> > 	I am sending you the patches with the changes mentioned in the 
> > Changelog below. Please read the patches in conjunction with my replies
> > sent to your previous comments.
> > 
> > Let me know your thoughts on this.
> 
> Sorry to take so long on this... lots of other things keep cropping up.
> 
> Mostly it looks okay; I noticed only a few problems.  Comments inline 
> below.
> 
> Alan Stern
> 
 
Thanks for reviewing it....hope that the code quickly morphs into a form
that is acceptable to most.
 
> > --- arch/x86/include/asm/processor.h.orig
> > +++ arch/x86/include/asm/processor.h
> > @@ -429,8 +430,11 @@ struct thread_struct {
> >  	unsigned long		debugreg1;
> >  	unsigned long		debugreg2;
> >  	unsigned long		debugreg3;
> > +	unsigned long		debugreg[HB_NUM];
> 
> You forgot to get rid of debugreg1 - debugreg3!
> 
>

After the patch re-arrangement, they are removed through "[Patch
06/11]". This is to enable compilation of the main tree after
application of every patch.
 
> > +static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
> > +{
> > +	int ret;
> > +
> > +	if (!bp)
> > +		return -EINVAL;
> 
> Is this test really needed?
> 

It protects us from any rogue requests which has a NULL bp pointer.
Given that the register_<kernel/user>_hw_breakpoint() interfaces are
exported, I think it keeps the code safe from poorly written modules.

And as I just see, the second parameter 'tsk' to validate_settings is
unused throughout. I will remove it.

> > +
> > +	ret = arch_validate_hwbkpt_settings(bp, tsk);
> > +	if (ret < 0)
> > +		goto err;
> > +
> > +	/* Check that the virtual address is in the proper range */
> > +	if (tsk) {
> > +		if (!arch_check_va_in_userspace(bp->info.address, bp->info.len))
> > +			return -EFAULT;
> > +	} else {
> > +		if (!arch_check_va_in_kernelspace(bp->info.address,
> > +								bp->info.len))
> > +			return -EFAULT;
> > +	}
> > + err:
> > +	return ret;
> > +}
> 
> At this point, almost nothing in the routine is arch-independent.  You
> might as well move everything into the arch-specific code and
> eliminate the validate_settings() routine.
> 

Ok. I will directly invoke arch_validate_hwbkpt_settings() [maybe this
should be called arch_validate_hw_breakpoint()] and move the arch_check_
function call inside arch_validate_.
 
> > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > +{
> > +	int i, j;
> > +
> > +	mutex_lock(&hw_breakpoint_mutex);
> > +
> > +	/* Find the 'bp' in our list of breakpoints for kernel */
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > +		if (bp == hbp_kernel[i])
> > +			break;
> > +
> > +	/* Check if we did not find a match for 'bp'. If so return early */
> > +	if (i == HB_NUM) {
> > +		mutex_unlock(&hw_breakpoint_mutex);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We'll shift the breakpoints one-level above to compact if
> > +	 * unregistration creates a hole
> > +	 */
> > +	for (j = i; j > hbp_kernel_pos; j--)
> > +		hbp_kernel[j] = hbp_kernel[j-1];
> > +
> > +	hbp_kernel[hbp_kernel_pos] = 0;
> 
> s/0/NULL/
> 

Ok.

> > +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> > +	hbp_kernel_pos++;
> 
> Don't you want to increment hbp_kernel_pos _before_ calling
> on_each_cpu()?  Otherwise you're telling the other CPUs to install an
> empty breakpoint.
> 

Incrementing hbp_kernel_pos after arch_update_kernel_hw_breakpoints()
during unregister_kernel_ is of the following significance:

- The 'pos' numbered debug register is reset to 0 through the code 
  "set_debugreg(hbp_kernel[i]->info.address, i)" where 'i' ranges from
  hbp_kernel_pos to HB_NUM. This is necessary during unregister operation.

- The following statements would reset the bits corresponding to
  unregistered breakpoint only then.
	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
	dr7 &= ~kdr7_masks[hbp_kernel_pos];

- Helps keep the arch_update_kernel_hw_breakpoints() code generic enough
  to be invoked during register and unregister operations.


> > +/* Unmasked kernel DR7 value */
> > +static unsigned long kdr7;
> > +static const unsigned long	kdr7_masks[HB_NUM] = {
> > +	0xffff00ff,	/* Same for 3, 2, 1, 0 */
> > +	0xfff000fc,	/* Same for 3, 2, 1 */
> > +	0xff0000f0,	/* Same for 3, 2 */
> > +	0xf00f00c0	/* LEN3, R/W3, G3, L3 */
> > +};
> 
> It looks like you took out the first line of assignments.  Isn't
> kdr7_masks supposed to contain HB_NUM + 1 elements?  Not to mention
> that reordering the lines has mixed up the comments.
> 

The kdr7_masks array is slightly different from your implementation
earlier.

The element zero in the array is 0xffff00ff, which has all bits
corresponding to DRs 0, 1, 2 and 3 'set', and hence the comment.
 
> > +void arch_update_kernel_hw_breakpoints(void *unused)
> > +{
> > +	struct hw_breakpoint *bp;
> > +	unsigned long dr7;
> > +	int i;
> > +
> > +	/* Check if there is nothing to update */
> > +	if (hbp_kernel_pos == HB_NUM)
> > +		return;
> 
> This is wrong; you have to set DR7 to 0 first.  Might as well leave
> out this test and just go through the whole routine.  The "for" loop
> will be skipped entirely if hbp_kernel_pos == HB_NUM, so you don't
> save much by returning early.
> 
> Ah, now I see this is why you removed a line from kdr7_masks.
> 

This check is added to help the load_debug_registers() call during
boot-time which is when hbp_kernel_pos = HB_NUM. If we don't return
early, then assignments such as  these "kdr7 &= ~kdr7_masks[hbp_kernel_pos];"
will cause array-out-of-bounds. Otherwise we have to circumvent it by
adding a dummy HB_NUM element in kdr7_masks[] array. I would prefer
retaining the "if (hbp_kernel_pos == HB_NUM)" check instead.

> > +
> > +	get_debugreg(dr7, 7);
> > +
> > +	/* Don't allow debug exceptions while we update the registers */
> > +	set_debugreg(0UL, 7);
> > +
> > +	/* Clear all kernel-space bits in kdr7 and dr7 before we set them */
> > +	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> > +	dr7 &= ~kdr7_masks[hbp_kernel_pos];
> > +
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> > +		bp = hbp_kernel[i];
> > +		if (bp) {
> > +			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
> > +			set_debugreg(hbp_kernel[i]->info.address, i);
> > +		}
> > +	}
> > +
> > +	dr7 |= kdr7;
> > +
> > +	/* No need to set DR6 */
> > +	set_debugreg(dr7, 7);
> > +}
> 
> 
> > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > +{
> > +	int i, rc = NOTIFY_STOP;
> > +	struct hw_breakpoint *bp;
> > +	/* The DR6 value is stored in args->err */
> > +	unsigned long dr7, dr6 = args->err;
> > +
> > +	/*
> > +	 * We are here because BT, BS or BD bit is set and no trap bits are set
> > +	 * in dr6 register. Do an early return
> > +	 */
> 
> I don't like such comments.  It says that BT, BS, or BD is set, but
> they don't necessarily have to be.  You should say:
> 
> 	/* Do an early return if no trap bits are set in DR6 */
> 

Since we enter hw_breakpoint_handler() code only if DIE_DEBUG (and not
say DIE_INT3), one or more of these bits would be 'set' when we are in
hw_breakpoint_handler() - BS, BT, BD or Trap bits, and hence the comment.

> > +	if ((dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) == 0)
> > +		return NOTIFY_DONE;
> > +
> > +	get_debugreg(dr7, 7);
> > +
> > +	/* Disable breakpoints during exception handling */
> > +	set_debugreg(0UL, 7);
> > +	/*
> > +	 * Assert that local interrupts are disabled
> > +	 * Reset the DRn bits in the virtualized register value.
> > +	 * The ptrace trigger routine will add in whatever is needed.
> > +	 */
> > +	current->thread.debugreg6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
> > +
> > +	/* Lazy debug register switching */
> > +	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> > +		switch_to_none_hw_breakpoint();
> > +		put_cpu_no_resched();
> > +	}
> 
> Once again, the get_debugreg and set_debugreg statements above should
> go here instead.
> 

I understood that you don't want to enable *any* breakpoints - even
kernel-space ones when we are in hw_breakpoint_handler() thereby making
it safer. I will change the code in the following iteration of patch.

> > +
> > +	/* Handle all the breakpoints that were triggered */
> > +	for (i = 0; i < HB_NUM; ++i) {
> > +		if (likely(!(dr6 & (DR_TRAP0 << i))))
> > +			continue;
> > +		/*
> > +		 * Find the corresponding hw_breakpoint structure and
> > +		 * invoke its triggered callback.
> > +		 */
> > +		if (i >= hbp_kernel_pos)
> > +			bp = hbp_kernel[i];
> > +		else {
> > +			bp = current->thread.hbp[i];
> > +			if (!bp) {
> > +				/* False alarm due to lazy DR switching */
> > +				continue;
> > +			}
> > +		}
> > +		(bp->triggered)(bp, args->regs);
> > +		/*
> > +		 * We have more work to do (send signals) if the breakpoint is
> > +		 * triggered due to user-space address, or if exceptions other
> > +		 * than breakpoint (such as single-stepping) are triggered.
> > +		 */
> > +		if (arch_check_va_in_userspace(bp->info.address, bp->info.len))
> > +			rc = NOTIFY_DONE;
> 
> This comment and test are more complicated than they need to be.  Just
> put "rc = NOTIFY_DONE" (with a comment) in the "else" clause above.
> 

Yes, that's much simpler and avoids a double-check. Will do it.

> > +	}
> > +	if (dr6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> > +		rc = NOTIFY_DONE;
> > +	set_debugreg(dr7, 7);
> > +	return rc;
> > +}
> 
> 
> >  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> >  {
> >  	struct task_struct *tsk = current;
> > -	unsigned long condition;
> > +	unsigned long dr6;
> >  	int si_code;
> >  
> > -	get_debugreg(condition, 6);
> > +	get_debugreg(dr6, 6);
> > +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
> >  
> >  	/* Catch kmemcheck conditions first of all! */
> > -	if (condition & DR_STEP && kmemcheck_trap(regs))
> > +	if ((dr6 & DR_STEP && kmemcheck_trap(regs)) &&
> > +		!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> >  		return;
> 
> Should this test go before the "set_debugreg(0, 6)" statement?
> 

Either way, it doesn't affect the function "kmemcheck_trap()". I don't
find it reading directly from DR6 anywhere.

On a related node, a patch that modifies dr6 to be passed by reference
to the notifier calls will be included in the next iteration.

Let me know, upon receiving the patch, if you still think that such a
change i.e. resetting bits in dr6 after handling of the corresponding 
breakpoint needs to be done.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Patch 00/11] Hardware Breakpoint interfaces
  2009-04-07  6:34 K.Prasad
@ 2009-04-16 21:19 ` Alan Stern
  2009-04-17  3:12   ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-04-16 21:19 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

On Tue, 7 Apr 2009, K.Prasad wrote:

> Hi Alan,
> 	I am sending you the patches with the changes mentioned in the 
> Changelog below. Please read the patches in conjunction with my replies
> sent to your previous comments.
> 
> Let me know your thoughts on this.

Sorry to take so long on this... lots of other things keep cropping up.

Mostly it looks okay; I noticed only a few problems.  Comments inline 
below.

Alan Stern



> --- arch/x86/include/asm/processor.h.orig
> +++ arch/x86/include/asm/processor.h
> @@ -429,8 +430,11 @@ struct thread_struct {
>  	unsigned long		debugreg1;
>  	unsigned long		debugreg2;
>  	unsigned long		debugreg3;
> +	unsigned long		debugreg[HB_NUM];

You forgot to get rid of debugreg1 - debugreg3!


> +static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
> +{
> +	int ret;
> +
> +	if (!bp)
> +		return -EINVAL;

Is this test really needed?

> +
> +	ret = arch_validate_hwbkpt_settings(bp, tsk);
> +	if (ret < 0)
> +		goto err;
> +
> +	/* Check that the virtual address is in the proper range */
> +	if (tsk) {
> +		if (!arch_check_va_in_userspace(bp->info.address, bp->info.len))
> +			return -EFAULT;
> +	} else {
> +		if (!arch_check_va_in_kernelspace(bp->info.address,
> +								bp->info.len))
> +			return -EFAULT;
> +	}
> + err:
> +	return ret;
> +}

At this point, almost nothing in the routine is arch-independent.  You
might as well move everything into the arch-specific code and
eliminate the validate_settings() routine.


> +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> +{
> +	int i, j;
> +
> +	mutex_lock(&hw_breakpoint_mutex);
> +
> +	/* Find the 'bp' in our list of breakpoints for kernel */
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> +		if (bp == hbp_kernel[i])
> +			break;
> +
> +	/* Check if we did not find a match for 'bp'. If so return early */
> +	if (i == HB_NUM) {
> +		mutex_unlock(&hw_breakpoint_mutex);
> +		return;
> +	}
> +
> +	/*
> +	 * We'll shift the breakpoints one-level above to compact if
> +	 * unregistration creates a hole
> +	 */
> +	for (j = i; j > hbp_kernel_pos; j--)
> +		hbp_kernel[j] = hbp_kernel[j-1];
> +
> +	hbp_kernel[hbp_kernel_pos] = 0;

s/0/NULL/

> +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> +	hbp_kernel_pos++;

Don't you want to increment hbp_kernel_pos _before_ calling
on_each_cpu()?  Otherwise you're telling the other CPUs to install an
empty breakpoint.


> +/* Unmasked kernel DR7 value */
> +static unsigned long kdr7;
> +static const unsigned long	kdr7_masks[HB_NUM] = {
> +	0xffff00ff,	/* Same for 3, 2, 1, 0 */
> +	0xfff000fc,	/* Same for 3, 2, 1 */
> +	0xff0000f0,	/* Same for 3, 2 */
> +	0xf00f00c0	/* LEN3, R/W3, G3, L3 */
> +};

It looks like you took out the first line of assignments.  Isn't
kdr7_masks supposed to contain HB_NUM + 1 elements?  Not to mention
that reordering the lines has mixed up the comments.


> +void arch_update_kernel_hw_breakpoints(void *unused)
> +{
> +	struct hw_breakpoint *bp;
> +	unsigned long dr7;
> +	int i;
> +
> +	/* Check if there is nothing to update */
> +	if (hbp_kernel_pos == HB_NUM)
> +		return;

This is wrong; you have to set DR7 to 0 first.  Might as well leave
out this test and just go through the whole routine.  The "for" loop
will be skipped entirely if hbp_kernel_pos == HB_NUM, so you don't
save much by returning early.

Ah, now I see this is why you removed a line from kdr7_masks.

> +
> +	get_debugreg(dr7, 7);
> +
> +	/* Don't allow debug exceptions while we update the registers */
> +	set_debugreg(0UL, 7);
> +
> +	/* Clear all kernel-space bits in kdr7 and dr7 before we set them */
> +	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> +	dr7 &= ~kdr7_masks[hbp_kernel_pos];
> +
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> +		bp = hbp_kernel[i];
> +		if (bp) {
> +			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
> +			set_debugreg(hbp_kernel[i]->info.address, i);
> +		}
> +	}
> +
> +	dr7 |= kdr7;
> +
> +	/* No need to set DR6 */
> +	set_debugreg(dr7, 7);
> +}


> +int __kprobes hw_breakpoint_handler(struct die_args *args)
> +{
> +	int i, rc = NOTIFY_STOP;
> +	struct hw_breakpoint *bp;
> +	/* The DR6 value is stored in args->err */
> +	unsigned long dr7, dr6 = args->err;
> +
> +	/*
> +	 * We are here because BT, BS or BD bit is set and no trap bits are set
> +	 * in dr6 register. Do an early return
> +	 */

I don't like such comments.  It says that BT, BS, or BD is set, but
they don't necessarily have to be.  You should say:

	/* Do an early return if no trap bits are set in DR6 */

> +	if ((dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) == 0)
> +		return NOTIFY_DONE;
> +
> +	get_debugreg(dr7, 7);
> +
> +	/* Disable breakpoints during exception handling */
> +	set_debugreg(0UL, 7);
> +	/*
> +	 * Assert that local interrupts are disabled
> +	 * Reset the DRn bits in the virtualized register value.
> +	 * The ptrace trigger routine will add in whatever is needed.
> +	 */
> +	current->thread.debugreg6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
> +
> +	/* Lazy debug register switching */
> +	if (per_cpu(last_debugged_task, get_cpu()) != current) {
> +		switch_to_none_hw_breakpoint();
> +		put_cpu_no_resched();
> +	}

Once again, the get_debugreg and set_debugreg statements above should
go here instead.

> +
> +	/* Handle all the breakpoints that were triggered */
> +	for (i = 0; i < HB_NUM; ++i) {
> +		if (likely(!(dr6 & (DR_TRAP0 << i))))
> +			continue;
> +		/*
> +		 * Find the corresponding hw_breakpoint structure and
> +		 * invoke its triggered callback.
> +		 */
> +		if (i >= hbp_kernel_pos)
> +			bp = hbp_kernel[i];
> +		else {
> +			bp = current->thread.hbp[i];
> +			if (!bp) {
> +				/* False alarm due to lazy DR switching */
> +				continue;
> +			}
> +		}
> +		(bp->triggered)(bp, args->regs);
> +		/*
> +		 * We have more work to do (send signals) if the breakpoint is
> +		 * triggered due to user-space address, or if exceptions other
> +		 * than breakpoint (such as single-stepping) are triggered.
> +		 */
> +		if (arch_check_va_in_userspace(bp->info.address, bp->info.len))
> +			rc = NOTIFY_DONE;

This comment and test are more complicated than they need to be.  Just
put "rc = NOTIFY_DONE" (with a comment) in the "else" clause above.

> +	}
> +	if (dr6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> +		rc = NOTIFY_DONE;
> +	set_debugreg(dr7, 7);
> +	return rc;
> +}


>  dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
>  {
>  	struct task_struct *tsk = current;
> -	unsigned long condition;
> +	unsigned long dr6;
>  	int si_code;
>  
> -	get_debugreg(condition, 6);
> +	get_debugreg(dr6, 6);
> +	set_debugreg(0, 6);	/* DR6 may or may not be cleared by the CPU */
>  
>  	/* Catch kmemcheck conditions first of all! */
> -	if (condition & DR_STEP && kmemcheck_trap(regs))
> +	if ((dr6 & DR_STEP && kmemcheck_trap(regs)) &&
> +		!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
>  		return;

Should this test go before the "set_debugreg(0, 6)" statement?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 00/11] Hardware Breakpoint interfaces
@ 2009-04-07  6:34 K.Prasad
  2009-04-16 21:19 ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-04-07  6:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, Frederic Weisbecker, maneesh,
	Roland McGrath, Steven Rostedt

Hi Alan,
	I am sending you the patches with the changes mentioned in the 
Changelog below. Please read the patches in conjunction with my replies
sent to your previous comments.

Let me know your thoughts on this.

Changelog
-----------
- Introduce (un)register_user_hw_breakpoint() which would take two parameters -
  pointer to 'struct task_struct' and 'struct hw_breakpoint'.
- Change the return value behaviour in ptrace(). Return '0' for write in DR0 -
  DR3 (with the exception being when the registered address is not in
  user-space), while restricting error returns for write in DR7. Move memory
  (de)allocation behaviour for 'struct hw_breakpoint' to ptrace related code and
  flush_thread_hw_breakpoint() from unregister_user_hw_breakpoint() code.
- Addition of arch_flush_thread_hw_breakpoint() to zero-out thread->debugreg[n]
  registers.
- Consolidate arch-specific kernel/thread update routines into
  arch_update_kernel_hw_breakpoints() and arch_update_user_hw_breakpoint()
  routines.
- Re-arrange the patchset to enable compilation after application of every
  patch in the patchset.
- hw_breakpoint_handler() modified to:
  - Unconditionally invoke (*trigger) function call
  - Return NOTIFY_DONE only when other bits in DR6 are set or when
    breakpoint originates due to an address in user-space
  - Do an early return if trap bits are not set and we have entered the handler
    because of BT, BS or BD bits in dr6 being set.
- samples/hw_breakpoint/data_breakpoint.c modified to accept module
  parameters

The patchset is based on commit f0e36dc28173b65df2216dfae7109645d97a1bd9 of -tip
tree.

Thanks,
K.Prasad



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-11 17:25       ` K.Prasad
@ 2009-03-11 17:30         ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2009-03-11 17:30 UTC (permalink / raw)
  To: K.Prasad
  Cc: Alan Stern, Andrew Morton, Linux Kernel Mailing List, Roland McGrath


* K.Prasad <prasad@linux.vnet.ibm.com> wrote:

> On Wed, Mar 11, 2009 at 12:34:43PM -0400, Alan Stern wrote:
> > On Wed, 11 Mar 2009, K.Prasad wrote:
> > 
> > > The hardware breakpoint interfaces haven't been put under any CONFIG_
> > > till now, but I think we should bring them under a new config, say
> > > CONFIG_HW_BREAKPOINT. It would help create a dependancy for
> > > CONFIG_KSYM_TRACER too.
> > 
> > With these patches, ptrace is dependent on hw-breakpoint.  
> > You can't disable CONFIG_HW_BREAKPOINT without breaking 
> > ptrace.
> > 
> > Alan Stern
> > 
> 
> Agreed. We might have to retain the old code for ptrace and 
> put the new implementation under #ifdef CONFIG_HW_BREAKPOINT 
> to get them working. What do you think?

With the simple reservation mechanism i suggested i have no 
problem with having HW_BREAKPOINT enabled [selected] 
unconditionally on x86.

Your ptrace changes are an improvement in terms of code quality 
so as long as the facility is simple and obvious, it's a step 
forward.

#ifdefs are ugly and hard to maintain - especially in such a
rarely used and still critical API as ptrace.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-11 16:34     ` Alan Stern
@ 2009-03-11 17:25       ` K.Prasad
  2009-03-11 17:30         ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-03-11 17:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List, Roland McGrath

On Wed, Mar 11, 2009 at 12:34:43PM -0400, Alan Stern wrote:
> On Wed, 11 Mar 2009, K.Prasad wrote:
> 
> > The hardware breakpoint interfaces haven't been put under any CONFIG_
> > till now, but I think we should bring them under a new config, say
> > CONFIG_HW_BREAKPOINT. It would help create a dependancy for
> > CONFIG_KSYM_TRACER too.
> 
> With these patches, ptrace is dependent on hw-breakpoint.  You can't 
> disable CONFIG_HW_BREAKPOINT without breaking ptrace.
> 
> Alan Stern
> 

Agreed. We might have to retain the old code for ptrace and put the new
implementation under #ifdef CONFIG_HW_BREAKPOINT to get them working.
What do you think?

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-11 12:11   ` K.Prasad
@ 2009-03-11 16:34     ` Alan Stern
  2009-03-11 17:25       ` K.Prasad
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-03-11 16:34 UTC (permalink / raw)
  To: K.Prasad
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List, Roland McGrath

On Wed, 11 Mar 2009, K.Prasad wrote:

> The hardware breakpoint interfaces haven't been put under any CONFIG_
> till now, but I think we should bring them under a new config, say
> CONFIG_HW_BREAKPOINT. It would help create a dependancy for
> CONFIG_KSYM_TRACER too.

With these patches, ptrace is dependent on hw-breakpoint.  You can't 
disable CONFIG_HW_BREAKPOINT without breaking ptrace.

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-10 13:46 ` Ingo Molnar
@ 2009-03-11 12:11   ` K.Prasad
  2009-03-11 16:34     ` Alan Stern
  0 siblings, 1 reply; 27+ messages in thread
From: K.Prasad @ 2009-03-11 12:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linux Kernel Mailing List, Alan Stern, Roland McGrath

On Tue, Mar 10, 2009 at 02:46:55PM +0100, Ingo Molnar wrote:
> 
> * prasad@linux.vnet.ibm.com <prasad@linux.vnet.ibm.com> wrote:
> 
> > Hi Ingo,
> 
> > 	Please find the revised set of patches that implement 
> > Hardware Breakpoint (or watchpoint) registers and an 
> > arch-specific implementation for x86/x86_64.
> 
> General structure looks good, with a good deal of details 
> that need to be addressed.
>

Thanks to Alan Stern for answering most of the questions....I am
pitching in to fill  the gaps and do any re-write based on the
comments.

> Firstly, as far as i can see this should work on 32-bit too, 
> correct?
> 

Yes. It's been tested on 32-bit x86 all throughout.

> Secondly, what about other architectures - will they build just 
> fine without any arch level glue code? kernel/hw_breakpoint.o 
> get build unconditionally - without any benefit to non-x86 code. 
> Perhaps an ARCH_HAS_HW_BREAKPOINTS Kconfig method would be 
> useful to add.

The hardware breakpoint interfaces haven't been put under any CONFIG_
till now, but I think we should bring them under a new config, say
CONFIG_HW_BREAKPOINT. It would help create a dependancy for
CONFIG_KSYM_TRACER too.

> 
> There's also a number of (small) style issues. 
> kernel/hw_breakpoint.c and other new .c files dont comply to the 
> customary comment style of:
> 
>   /*
>    * Comment .....
>    * ...... goes here:
>    */
> 
> also, the #include files section style should match that of 
> arch/x86/mm/fault.c - it's a conflict-avoidance style.
> 
> also, things like this:
> 
> static struct notifier_block hw_breakpoint_exceptions_nb = {
>         .notifier_call = hw_breakpoint_exceptions_notify,
>         .priority = 0x7fffffff /* we need to be notified first */
> };
> 
> should be:
> 
> static struct notifier_block hw_breakpoint_exceptions_nb = {
>         .notifier_call		= hw_breakpoint_exceptions_notify,
> 	/* We need to be notified first: */
>         .priority		= 0x7fffffff,
> };
> 
> 	Ingo

Sure, will look at the comment styling before I re-send the patchset.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-10 14:24   ` Alan Stern
@ 2009-03-10 14:54     ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2009-03-10 14:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: prasad, Andrew Morton, Linux Kernel Mailing List, Roland McGrath


* Alan Stern <stern@rowland.harvard.edu> wrote:

> On Tue, 10 Mar 2009, Ingo Molnar wrote:
> 
> > 
> > There's also a few checkpatch warnings that need to be 
> > addressed:
> > 
> > ERROR: do not use assignment in if condition
> > #1084: FILE: arch/x86/kernel/ptrace.c:581:
> > +	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == 
> > NULL)
> 
> Changing this to remove the assignment from within the "if" 
> condition would make the code less readable, not more. [...]

It's not just about basic readability. We dont do assignmets 
like that because it's very easy to miss them (and their side 
effects) and conflating them with '=='.

  It's part of a long 
> series of tests; in schematic form:
> 
> 	if (n == 4 || n == 5)
> ...
> 	else if (n == 6) {
> ...
> 	}
> 	else if (!tsk->thread.hw_breakpoint_info && val == 0)
> ...
> 	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
> ...
> 	else if (n < HB_NUM) {
> ...
> 	}
> 	else
> ...
> 
> If you can suggest a way to change the code without making it 
> worse, I'm sure Prasad will be happy to adopt it.

The obvious solution which we use in many other places in 
arch/x86 is to split out that butt-ugly if else if else if else 
maze into a helper inline function that does:

 	if (n == 4 || n == 5) {
		do stuff;
		return;
	}
		
 	if (n == 6) {
		do stuff;
		return;
 	}

	if (!tsk->thread.hw_breakpoint_info && val == 0) {
		do stuff;
		return;
	}

	thbi = alloc_thread_hw_breakpoint(tsk);

	if (!tbhi)
		...

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-10 13:51 ` Ingo Molnar
@ 2009-03-10 14:24   ` Alan Stern
  2009-03-10 14:54     ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Stern @ 2009-03-10 14:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: prasad, Andrew Morton, Linux Kernel Mailing List, Roland McGrath

On Tue, 10 Mar 2009, Ingo Molnar wrote:

> 
> There's also a few checkpatch warnings that need to be 
> addressed:
> 
> ERROR: do not use assignment in if condition
> #1084: FILE: arch/x86/kernel/ptrace.c:581:
> +	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == 
> NULL)

Changing this to remove the assignment from within the "if" condition 
would make the code less readable, not more.  It's part of a long 
series of tests; in schematic form:

	if (n == 4 || n == 5)
...
	else if (n == 6) {
...
	}
	else if (!tsk->thread.hw_breakpoint_info && val == 0)
...
	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
...
	else if (n < HB_NUM) {
...
	}
	else
...

If you can suggest a way to change the code without making it worse,
I'm sure Prasad will be happy to adopt it.

Alan Stern


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-05  4:37 [patch 00/11] Hardware Breakpoint interfaces prasad
  2009-03-10 13:46 ` Ingo Molnar
@ 2009-03-10 13:51 ` Ingo Molnar
  2009-03-10 14:24   ` Alan Stern
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2009-03-10 13:51 UTC (permalink / raw)
  To: prasad
  Cc: Andrew Morton, Linux Kernel Mailing List, Alan Stern, Roland McGrath


There's also a few checkpatch warnings that need to be 
addressed:

ERROR: do not use assignment in if condition
#1084: FILE: arch/x86/kernel/ptrace.c:581:
+	else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == 
NULL)

WARNING: suspect code indent for conditional statements (16, 16)
#2836: FILE: kernel/trace/trace_ksym.c:324:
+		if (strncmp(entry->ksym_hbkpt->info.name, 
KSYM_SELFTEST_ENTRY,
[...]
+		kfree(entry->ksym_hbkpt->info.name);

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 00/11] Hardware Breakpoint interfaces
  2009-03-05  4:37 [patch 00/11] Hardware Breakpoint interfaces prasad
@ 2009-03-10 13:46 ` Ingo Molnar
  2009-03-11 12:11   ` K.Prasad
  2009-03-10 13:51 ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2009-03-10 13:46 UTC (permalink / raw)
  To: prasad
  Cc: Andrew Morton, Linux Kernel Mailing List, Alan Stern, Roland McGrath


* prasad@linux.vnet.ibm.com <prasad@linux.vnet.ibm.com> wrote:

> Hi Ingo,

> 	Please find the revised set of patches that implement 
> Hardware Breakpoint (or watchpoint) registers and an 
> arch-specific implementation for x86/x86_64.

General structure looks good, with a good deal of details 
that need to be addressed.

Firstly, as far as i can see this should work on 32-bit too, 
correct?

Secondly, what about other architectures - will they build just 
fine without any arch level glue code? kernel/hw_breakpoint.o 
get build unconditionally - without any benefit to non-x86 code. 
Perhaps an ARCH_HAS_HW_BREAKPOINTS Kconfig method would be 
useful to add.

There's also a number of (small) style issues. 
kernel/hw_breakpoint.c and other new .c files dont comply to the 
customary comment style of:

  /*
   * Comment .....
   * ...... goes here:
   */

also, the #include files section style should match that of 
arch/x86/mm/fault.c - it's a conflict-avoidance style.

also, things like this:

static struct notifier_block hw_breakpoint_exceptions_nb = {
        .notifier_call = hw_breakpoint_exceptions_notify,
        .priority = 0x7fffffff /* we need to be notified first */
};

should be:

static struct notifier_block hw_breakpoint_exceptions_nb = {
        .notifier_call		= hw_breakpoint_exceptions_notify,
	/* We need to be notified first: */
        .priority		= 0x7fffffff,
};

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Patch 00/11] Hardware Breakpoint Interfaces
@ 2009-03-07  5:04 prasad
  0 siblings, 0 replies; 27+ messages in thread
From: prasad @ 2009-03-07  5:04 UTC (permalink / raw)
  To: mingo
  Cc: Andrew Morton, Linux Kernel Mailing List, Alan Stern, Roland McGrath

Hi Ingo,
        Please find the revised set of patches that implement Hardware
Breakpoint (or watchpoint) registers and an arch-specific implementation
for x86/x86_64.

Changelog
---------
A previous version of these patches were submitted through
http://thread.gmane.org/gmane.linux.kernel/802597. Changes in this
patchset over the previous version include

- Addition of a startup selftest to the ftrace plugin that checks for
  traces after a 'write' operation over a variable registered through
  the plugin for tracing (Patch 11/11).
- A fully-featured reset function - ksym_trace_reset() (Patch 11/11).
- Some of the potential memory leak scenarios identified through the
  comments (Patch 11/11).
- Rebased the patchset against commit
  34bc9c8fa7c868040407cc4822ed517a43792fe8 in -tip tree.

Kindly include them to be a part of -tip tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 00/11] Hardware Breakpoint interfaces
@ 2009-03-05  4:37 prasad
  2009-03-10 13:46 ` Ingo Molnar
  2009-03-10 13:51 ` Ingo Molnar
  0 siblings, 2 replies; 27+ messages in thread
From: prasad @ 2009-03-05  4:37 UTC (permalink / raw)
  To: mingo
  Cc: Andrew Morton, Linux Kernel Mailing List, Alan Stern, Roland McGrath

Hi Ingo,
	Please find the revised set of patches that implement Hardware
Breakpoint (or watchpoint) registers and an arch-specific implementation
for x86/x86_64.

Changelog
---------
The previous version of these patches were submitted through
http://article.gmane.org/gmane.linux.kernel/794870. The patches now
contain a new ftrace plugin for kernel symbol tracing using hardware
breakpoint registers. The patches are now re-based to -tip tree commit
e40aa382597b30c4d702951348500e812b4cb3d0.

A small usage note on the plugin along with the description of the
breakpoint interfaces is included below.

Kindly accept these patches to become a part of -tip tree.

Description
-------------
The Hardware Breakpoint registers can be used for tracing changes to a
variable or data location (even I/O ports in x86/x86_64) and will be
extremely helpful in debugging problems such as memory corruption. While
these registers have been used by user-space debuggers for long (through
'ptrace' syscalls), Kgdb exploited them for kernel-space addresses.

The proposed framework, introduces interfaces to use them directly on
both  user- and kernel-space addresses apart from arbitrating requests
from various such users for the limited number of registers.

The interfaces are:

int register_kernel_hw_breakpoint(struct hw_breakpoint *);
void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);

int register_user_hw_breakpoint(struct task_struct *tsk,
                                 struct hw_breakpoint *bp);
void unregister_user_hw_breakpoint(struct task_struct *tsk,
                struct hw_breakpoint *bp);

The 'struct hw_breakpoint' will be the anchor data-structure containing
all the necessary information such as name or address, type, length,
priority and pointers to handler functions (some of which are
arch-specific). More information about the role of each field, the
handler
functions and their return values can be found in the descriptive
comments
preceding these functions and in "include/asm-generic/hw_breakpoint.h".

While (un)register_user_hw_breakpoint() isn't exported yet, its
worker-routine __register_user-hw_breakpoint() is used by ptrace syscall
for all breakpoint register requirements. For the kernel-space, a simple
use case to trace 'write' operations on a kernel variable can be found
in samples/hw_breakpoint/data_breakpoint.c.

In the current patchset, support is provided only for read and
read-write breakpoints on data locations, which can be later expand to
include instruction and I/O breakpoints for x86/x86_64.

There is pending integration with 'KGDB' without which mutual exclusion
between them (KGDB and HW breakpoint use through above interfaces) needs
to be observed.

Ftrace plugin
-------------
Usage
------
mount -t debugfs debugfs /debug
cd /debug/tracing/
echo 0 > tracing_enabled
cat available_tracers
echo ksym_tracer > current_tracer
echo "pid_max:rw-" > ksym_trace_filter
echo "jiffies:-w-" > ksym_trace_filter

echo 1 > tracing_enabled

cat trace


Sample Output (snipped)
-------------

[root@mymachine tracing]# cat trace
# tracer: ksym_tracer
#
#       TASK-PID      CPU#      Symbol         Type    Function
#          |           |          |              |         |
atieventsd      3053  1   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5394  1   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5394  0   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5394  0   pid_max               RW alloc_pid+0x6c/0x2fd
bash            4898  1   pid_max               RW alloc_pid+0x6c/0x2fd
atieventsd      3053  1   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5401  1   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5413  0   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5415  0   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5415  0   pid_max               RW alloc_pid+0x6c/0x2fd
authatieventsd. 5413  0   pid_max               RW alloc_pid+0x6c/0x2fd
hald-runner     2766  0   pid_max               RW alloc_pid+0x6c/0x2fd
hal-system-kill 5425  1   pid_max               RW alloc_pid+0x6c/0x2fd
hal-system-kill 5425  1   pid_max               RW alloc_pid+0x6c/0x2fd
swapper         0     0   jiffies               W  do_timer+0x16/0xb0
swapper         0     0   jiffies               W  do_timer+0x16/0xb0
gnome-terminal  4521  0   jiffies               W  do_timer+0x16/0xb0
Xorg            3235  0   jiffies               W  do_timer+0x16/0xb0
Xorg            3235  0   jiffies               W  do_timer+0x16/0xb0
[root@mymachine tracing]#

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-04-24 16:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-24 15:24 [Patch 00/11] Hardware Breakpoint interfaces K.Prasad
2009-03-25 19:48 ` Alan Stern
2009-03-27 22:06   ` K.Prasad
2009-04-01 16:16     ` Alan Stern
2009-04-07  8:22       ` K.Prasad
2009-04-09 20:50         ` Alan Stern
2009-03-28  8:46   ` K.Prasad
2009-04-01 16:22     ` Alan Stern
2009-04-07  8:22       ` K.Prasad
  -- strict thread matches above, loose matches on Subject: below --
2009-04-07  6:34 K.Prasad
2009-04-16 21:19 ` Alan Stern
2009-04-17  3:12   ` K.Prasad
2009-04-17 14:37     ` Alan Stern
2009-04-24  5:56       ` K.Prasad
2009-04-24 14:16         ` Alan Stern
2009-04-24 15:57           ` K.Prasad
2009-04-24 16:16             ` Alan Stern
2009-03-07  5:04 [Patch 00/11] Hardware Breakpoint Interfaces prasad
2009-03-05  4:37 [patch 00/11] Hardware Breakpoint interfaces prasad
2009-03-10 13:46 ` Ingo Molnar
2009-03-11 12:11   ` K.Prasad
2009-03-11 16:34     ` Alan Stern
2009-03-11 17:25       ` K.Prasad
2009-03-11 17:30         ` Ingo Molnar
2009-03-10 13:51 ` Ingo Molnar
2009-03-10 14:24   ` Alan Stern
2009-03-10 14:54     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).