All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch 00/12] Hardware Breakpoint Interfaces
@ 2009-04-24  6:13 K.Prasad
  2009-05-04 20:55 ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-04-24  6:13 UTC (permalink / raw)
  To: Alan Stern, Steven Rostedt, Frederic Weisbecker
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, maneesh, Roland McGrath

Hi Alan,
	The following patches contain the changes mentioned below and is based
on commit 335a1e07e2281795064b909aa75e3071609abd0e of -tip tree.

The changes to passing of DR6 register value in traps.c is separated into
[Patch 12/12]. kprobes and HW breakpoints have been found to work fine after 
the changes on an x86 box.

Kindly let me know what you think of the changes.

Hi Steven/Frederick,
		The changes to the ftrace plugin that converts all mutex locks
into spinlocks are done now and have been tested to work fine. The patch can be 
found in Patch 11/12. Kindly acknowledge the same.

Changelog
-----------
- Convert hw_breakpoint_mutex into a spinlock() based implementation (will now
  be called as hw_breakpoint_lock). This is to avoid a call to a mutex (which
  can potentially sleep) in the context of a softIRQ through which
  flush_thread_hw_breakpoint() is invoked at the time of process exit.
- Call put_cpu_no_resched() unconditionally in hw_breakpoint_handler() code (the
  previous code erroneously invoked it inside an "if" condition)
- Use PTR_ERR and ERR_PTR macros for passing dr6 register value in do_debug()
- Reset bits in local copy of DR6 register in do_debug() function to indicate
  that the exception corresponding to the bit has been handled.
- Convert all synchronisation primitives into spinlocks in trace_ksym.c to
  prevent the potential interference between statistics collection code in
  ksym_collect_stats() with other ksym filter file's read/write routines.
- A few changes in comments and code following comments from Alan Stern.

Thanks,
K.Prasad



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-04-24  6:13 [Patch 00/12] Hardware Breakpoint Interfaces K.Prasad
@ 2009-05-04 20:55 ` Alan Stern
  2009-05-11 11:36   ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-05-04 20:55 UTC (permalink / raw)
  To: K.Prasad
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Fri, 24 Apr 2009, K.Prasad wrote:

> Hi Alan,
> 	The following patches contain the changes mentioned below and is based
> on commit 335a1e07e2281795064b909aa75e3071609abd0e of -tip tree.
> 
> The changes to passing of DR6 register value in traps.c is separated into
> [Patch 12/12]. kprobes and HW breakpoints have been found to work fine after 
> the changes on an x86 box.
> 
> Kindly let me know what you think of the changes.

They look pretty good.  However some of the older stuff still needs 
more work.

> --- /dev/null
> +++ arch/x86/include/asm/hw_breakpoint.h

> +void arch_install_thread_hw_breakpoint(struct task_struct *tsk);
> +void arch_uninstall_thread_hw_breakpoint(void);
> +void arch_install_kernel_hw_breakpoint(void *);

This routine doesn't exist, but maybe it should.  See below...


> --- /dev/null
> +++ kernel/hw_breakpoint.c

> +int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> +{
> +	int rc;
> +
> +	rc = arch_validate_hwbkpt_settings(bp, NULL);
> +	if (rc)
> +		return rc;
> +
> +	spin_lock(&hw_breakpoint_lock);
> +
> +	rc = -EINVAL;
> +	/* Check if we are over-committing */
> +	if ((hbp_kernel_pos > 0) && (!hbp_user_refcount[hbp_kernel_pos-1])) {
> +		hbp_kernel_pos--;
> +		hbp_kernel[hbp_kernel_pos] = bp;
> +		on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);

You shouldn't call on_each_cpu() while holding a spinlock.  The same
thing happens in unregister_kernel_hw_breakpoint().

> +		rc = 0;
> +	}
> +
> +	spin_unlock(&hw_breakpoint_lock);
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
> +
> +/**
> + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> + * @bp: the breakpoint structure to unregister
> + *
> + * Uninstalls and unregisters @bp.
> + */
> +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> +{
> +	int i, j;
> +
> +	spin_lock(&hw_breakpoint_lock);
> +
> +	/* Find the 'bp' in our list of breakpoints for kernel */
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> +		if (bp == hbp_kernel[i])
> +			break;
> +
> +	/* Check if we did not find a match for 'bp'. If so return early */
> +	if (i == HB_NUM) {
> +		spin_unlock(&hw_breakpoint_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * We'll shift the breakpoints one-level above to compact if
> +	 * unregistration creates a hole
> +	 */
> +	for (j = i; j > hbp_kernel_pos; j--)
> +		hbp_kernel[j] = hbp_kernel[j-1];

What happens if a kernel breakpoint is triggered on another CPU while
this loop is running?  Or what happens if the breakpoint being removed
is triggered on another CPU before on_each_cpu() is called below?

Basically, it's impossible to change the kernel breakpoints 
simultaneously on all CPUs.  That means you somehow have to keep both 
the old set and the new set around until all the CPUs are updated.

> +
> +	hbp_kernel[hbp_kernel_pos] = NULL;
> +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> +	hbp_kernel_pos++;
> +
> +	spin_unlock(&hw_breakpoint_lock);
> +}


> --- /dev/null
> +++ arch/x86/kernel/hw_breakpoint.c

> +/* Unmasked kernel DR7 value */
> +static unsigned long kdr7;
> +static const unsigned long	kdr7_masks[HB_NUM + 1] = {
> +	0xffff00ff,	/* LEN3, R/W3, G3, L3 */
> +	0xfff000fc,	/* Same for 3, 2 */
> +	0xff0000f0,	/* Same for 3, 2, 1 */
> +	0xf00f00c0,	/* Same for 3, 2, 1, 0 */
> +	0x00000000      /* Dummy mask used when 'pos' is HB_NUM */
> +};

These comments are completely messed up.  The comment on the first
line, "LEN3, R/W3, G3, L3", actually applies to the fourth value,
0xf00f00c0.  Likewise for the others.

In the end this may not matter...


> +void arch_update_kernel_hw_breakpoints(void *unused)
> +{
> +	struct hw_breakpoint *bp;
> +	unsigned long dr7;
> +	int i;
> +
> +	get_debugreg(dr7, 7);
> +	/* Don't allow debug exceptions while we update the registers */
> +	set_debugreg(0UL, 7);
> +
> +	/* Clear all kernel-space bits in kdr7 and dr7 before we set them */
> +	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> +	dr7 &= ~kdr7_masks[hbp_kernel_pos];

You probably should use current->thread.debugreg7 and eliminate the
dr7 variable entirely.  That also means you can get rid of kdr7_masks, 
and it means you can increment hpb_kernel_pos before doing the 
on_each_cpu() call.

> +
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> +		bp = hbp_kernel[i];
> +		if (bp) {
> +			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
> +			set_debugreg(hbp_kernel[i]->info.address, i);
> +		}
> +	}

Another problem: kdr7 is a global variable, and here you've got every
CPU recomputing it whenever a kernel breakpoint is added or removed.
It should be computed just once, before the on_each_cpu() call.

> +
> +	dr7 |= kdr7;
> +
> +	/* No need to set DR6 */
> +	set_debugreg(dr7, 7);
> +}


> --- arch/x86/kernel/ptrace.c.orig
> +++ arch/x86/kernel/ptrace.c

> +static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
> +{
> +	struct thread_struct *thread = &(tsk->thread);
> +	unsigned long old_dr7 = thread->debugreg7;
> +	int i, rc = 0;
> +	int enabled;
> +	unsigned len, type;
> +	struct hw_breakpoint *bp;
> +	/*
> +	 * We want to use allocated memory inside a spinlock and we use the
> +	 * trick below
> +	 */
> +	int temp_mem_used = 0;
> +	void *temp_mem = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> +	if (!temp_mem)
> +		temp_mem_used = -ENOMEM;

I don't think this is a good idea...

> +restore:
> +	/*
> +	 * Loop through all the hardware breakpoints, making the
> +	 * appropriate changes to each.
> +	 */
> +	spin_lock(&hw_breakpoint_lock);
> +
> +	for (i = 0; i < HB_NUM; i++) {
> +		enabled = decode_dr7(data, i, &len, &type);
> +		bp = thread->hbp[i];
> +
> +		if (!enabled) {
> +			if (bp) {
> +				__unregister_user_hw_breakpoint(i, tsk);
> +				kfree(bp);
> +			}
> +			continue;
> +		}
> +
> +		if (!bp) {
> +			rc = -ENOMEM;
> +			if (temp_mem_used != -ENOMEM) {
> +				bp = temp_mem;

What happens if several new breakpoints are present at the same time?
You'd end up using the same memory for all of them.

> +				bp->info.address = thread->debugreg[i];
> +				bp->triggered = ptrace_triggered;
> +				bp->info.len = len;
> +				bp->info.type = type;
> +				temp_mem_used = 1;
> +				rc = __register_user_hw_breakpoint(i, tsk, bp);
> +				if (!rc)
> +					set_tsk_thread_flag(tsk, TIF_DEBUG);
> +				else
> +					kfree(bp);
> +			}
> +		} else
> +			rc = __modify_user_hw_breakpoint(i, tsk, bp);
> +
> +		if (rc)
> +			break;
> + 	}
> +	spin_unlock(&hw_breakpoint_lock);
> +
> +	/* If anything above failed, restore the original settings */
> +	if (rc < 0) {
> +		data = old_dr7;
> +		goto restore;

And now if something went wrong, you have already freed the memory
holding the original breakpoint structures.  It would be better to
keep them around until you know they aren't going to be needed.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-04 20:55 ` Alan Stern
@ 2009-05-11 11:36   ` K.Prasad
  2009-05-11 14:50     ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-11 11:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Mon, May 04, 2009 at 04:55:03PM -0400, Alan Stern wrote:
> On Fri, 24 Apr 2009, K.Prasad wrote:
> 
> > Hi Alan,
> > 	The following patches contain the changes mentioned below and is based
> > on commit 335a1e07e2281795064b909aa75e3071609abd0e of -tip tree.
> > 
> > The changes to passing of DR6 register value in traps.c is separated into
> > [Patch 12/12]. kprobes and HW breakpoints have been found to work fine after 
> > the changes on an x86 box.
> > 
> > Kindly let me know what you think of the changes.
> 
> They look pretty good.  However some of the older stuff still needs 
> more work.
> 
> > --- /dev/null
> > +++ arch/x86/include/asm/hw_breakpoint.h
> 
> > +void arch_install_thread_hw_breakpoint(struct task_struct *tsk);
> > +void arch_uninstall_thread_hw_breakpoint(void);
> > +void arch_install_kernel_hw_breakpoint(void *);
> 
> This routine doesn't exist, but maybe it should.  See below...
> 
> 

Hi Alan,
	Thank you for the comments. I discovered a few locking related
issues in the code and had to fix them before I could send the patch
with your suggestions applied, and hence the delay.

Please find responses to your comments inline, while more changes
accompany the patch sent separately.

> > --- /dev/null
> > +++ kernel/hw_breakpoint.c
> 
> > +int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > +{
> > +	int rc;
> > +
> > +	rc = arch_validate_hwbkpt_settings(bp, NULL);
> > +	if (rc)
> > +		return rc;
> > +
> > +	spin_lock(&hw_breakpoint_lock);
> > +
> > +	rc = -EINVAL;
> > +	/* Check if we are over-committing */
> > +	if ((hbp_kernel_pos > 0) && (!hbp_user_refcount[hbp_kernel_pos-1])) {
> > +		hbp_kernel_pos--;
> > +		hbp_kernel[hbp_kernel_pos] = bp;
> > +		on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> 
> You shouldn't call on_each_cpu() while holding a spinlock.  The same
> thing happens in unregister_kernel_hw_breakpoint().
> 

First, on_each_cpu() will now be changed to return only after all
functions invoked through IPIs have returned (by changing @wait
parameter to 1). This is required to prevent side effects of
incrementing hbp_kernel_pos after on_each_cpu() in
unregister_kernel_hw_breakpoint() [hbp_kernel_pos is still incremented
after IPI and I will explain it below].

on_each_cpu() isn't a blocking call (despite @wait being set to 1, which
does a busy wait through cpu_relax()) and should be safe to invoke
inside a spin_lock() context. I would like to know if you think
otherwise.

> > +		rc = 0;
> > +	}
> > +
> > +	spin_unlock(&hw_breakpoint_lock);
> > +	return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
> > +
> > +/**
> > + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> > + * @bp: the breakpoint structure to unregister
> > + *
> > + * Uninstalls and unregisters @bp.
> > + */
> > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > +{
> > +	int i, j;
> > +
> > +	spin_lock(&hw_breakpoint_lock);
> > +
> > +	/* Find the 'bp' in our list of breakpoints for kernel */
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > +		if (bp == hbp_kernel[i])
> > +			break;
> > +
> > +	/* Check if we did not find a match for 'bp'. If so return early */
> > +	if (i == HB_NUM) {
> > +		spin_unlock(&hw_breakpoint_lock);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We'll shift the breakpoints one-level above to compact if
> > +	 * unregistration creates a hole
> > +	 */
> > +	for (j = i; j > hbp_kernel_pos; j--)
> > +		hbp_kernel[j] = hbp_kernel[j-1];
> 
> What happens if a kernel breakpoint is triggered on another CPU while
> this loop is running?  Or what happens if the breakpoint being removed
> is triggered on another CPU before on_each_cpu() is called below?
> 
> Basically, it's impossible to change the kernel breakpoints 
> simultaneously on all CPUs.  That means you somehow have to keep both 
> the old set and the new set around until all the CPUs are updated.
> 

I must admit that the code did not handle the above scenario. I am
adding a per-cpu instance of 'hbp_kernel[]' called 'this_hbp_kernel[]'.
The breakpoint handler would use the per-cpu instance which will remain
valid throughout the execution of the handler. The per-cpu instance will
be updated with hbp_kernel[] values in arch_update_kernel_hw_breakpoint().
[This necessitates hbp_kernel_pos increment to happen after the IPI call
in unregister_kernel code].

> > +
> > +	hbp_kernel[hbp_kernel_pos] = NULL;
> > +	on_each_cpu(arch_update_kernel_hw_breakpoints, NULL, 0);
> > +	hbp_kernel_pos++;
> > +
> > +	spin_unlock(&hw_breakpoint_lock);
> > +}
> 
> 
> > --- /dev/null
> > +++ arch/x86/kernel/hw_breakpoint.c
> 
> > +/* Unmasked kernel DR7 value */
> > +static unsigned long kdr7;
> > +static const unsigned long	kdr7_masks[HB_NUM + 1] = {
> > +	0xffff00ff,	/* LEN3, R/W3, G3, L3 */
> > +	0xfff000fc,	/* Same for 3, 2 */
> > +	0xff0000f0,	/* Same for 3, 2, 1 */
> > +	0xf00f00c0,	/* Same for 3, 2, 1, 0 */
> > +	0x00000000      /* Dummy mask used when 'pos' is HB_NUM */
> > +};
> 
> These comments are completely messed up.  The comment on the first
> line, "LEN3, R/W3, G3, L3", actually applies to the fourth value,
> 0xf00f00c0.  Likewise for the others.
> 
> In the end this may not matter...
> 
> 

The kdr7_masks[] are now removed as a result of the code re-write you
suggested below!!

> > +void arch_update_kernel_hw_breakpoints(void *unused)
> > +{
> > +	struct hw_breakpoint *bp;
> > +	unsigned long dr7;
> > +	int i;
> > +
> > +	get_debugreg(dr7, 7);
> > +	/* Don't allow debug exceptions while we update the registers */
> > +	set_debugreg(0UL, 7);
> > +
> > +	/* Clear all kernel-space bits in kdr7 and dr7 before we set them */
> > +	kdr7 &= ~kdr7_masks[hbp_kernel_pos];
> > +	dr7 &= ~kdr7_masks[hbp_kernel_pos];
> 
> You probably should use current->thread.debugreg7 and eliminate the
> dr7 variable entirely.  That also means you can get rid of kdr7_masks, 
> and it means you can increment hpb_kernel_pos before doing the 
> on_each_cpu() call.
> 

It's a nice optimisation. I've included your suggestion.

> > +
> > +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> > +		bp = hbp_kernel[i];
> > +		if (bp) {
> > +			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
> > +			set_debugreg(hbp_kernel[i]->info.address, i);
> > +		}
> > +	}
> 
> Another problem: kdr7 is a global variable, and here you've got every
> CPU recomputing it whenever a kernel breakpoint is added or removed.
> It should be computed just once, before the on_each_cpu() call.
> 

If kdr7 needs to be updated only once, it has to be kept outside the IPI
through the use of a wrapper routine (in arch/x86/kernel/hw_breakpoint.c
as it is arch-specific). This would mean one more function call in
(un)register_kernel_<> routines taking the code back to one of its previous
designs. In a trade-off between code-brevity and efficiency, the present one
chose the former keeping in mind some of the comments received during the
early stages of this patch.

> > +
> > +	dr7 |= kdr7;
> > +
> > +	/* No need to set DR6 */
> > +	set_debugreg(dr7, 7);
> > +}
> 
> 
> > --- arch/x86/kernel/ptrace.c.orig
> > +++ arch/x86/kernel/ptrace.c
> 
> > +static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
> > +{
> > +	struct thread_struct *thread = &(tsk->thread);
> > +	unsigned long old_dr7 = thread->debugreg7;
> > +	int i, rc = 0;
> > +	int enabled;
> > +	unsigned len, type;
> > +	struct hw_breakpoint *bp;
> > +	/*
> > +	 * We want to use allocated memory inside a spinlock and we use the
> > +	 * trick below
> > +	 */
> > +	int temp_mem_used = 0;
> > +	void *temp_mem = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> > +	if (!temp_mem)
> > +		temp_mem_used = -ENOMEM;
> 
> I don't think this is a good idea...
> 

I agree that it turned out to be wrong. ptrace is now modified to use
the (un)register_user_hw_breakpoint() interfaces directly and not the
worker routines, thereby avoiding all this complexity. Please find the
changes in the new patch.

> > +restore:
> > +	/*
> > +	 * Loop through all the hardware breakpoints, making the
> > +	 * appropriate changes to each.
> > +	 */
> > +	spin_lock(&hw_breakpoint_lock);
> > +
> > +	for (i = 0; i < HB_NUM; i++) {
> > +		enabled = decode_dr7(data, i, &len, &type);
> > +		bp = thread->hbp[i];
> > +
> > +		if (!enabled) {
> > +			if (bp) {
> > +				__unregister_user_hw_breakpoint(i, tsk);
> > +				kfree(bp);
> > +			}
> > +			continue;
> > +		}
> > +
> > +		if (!bp) {
> > +			rc = -ENOMEM;
> > +			if (temp_mem_used != -ENOMEM) {
> > +				bp = temp_mem;
> 
> What happens if several new breakpoints are present at the same time?
> You'd end up using the same memory for all of them.
> 
> > +				bp->info.address = thread->debugreg[i];
> > +				bp->triggered = ptrace_triggered;
> > +				bp->info.len = len;
> > +				bp->info.type = type;
> > +				temp_mem_used = 1;
> > +				rc = __register_user_hw_breakpoint(i, tsk, bp);
> > +				if (!rc)
> > +					set_tsk_thread_flag(tsk, TIF_DEBUG);
> > +				else
> > +					kfree(bp);
> > +			}
> > +		} else
> > +			rc = __modify_user_hw_breakpoint(i, tsk, bp);
> > +
> > +		if (rc)
> > +			break;
> > + 	}
> > +	spin_unlock(&hw_breakpoint_lock);
> > +
> > +	/* If anything above failed, restore the original settings */
> > +	if (rc < 0) {
> > +		data = old_dr7;
> > +		goto restore;
> 
> And now if something went wrong, you have already freed the memory
> holding the original breakpoint structures.  It would be better to
> keep them around until you know they aren't going to be needed.
> 
> Alan Stern
> 

Thanks again for your comments.

-- K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-11 11:36   ` K.Prasad
@ 2009-05-11 14:50     ` Alan Stern
  2009-05-11 17:20       ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-05-11 14:50 UTC (permalink / raw)
  To: K.Prasad
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Mon, 11 May 2009, K.Prasad wrote:

> > You shouldn't call on_each_cpu() while holding a spinlock.  The same
> > thing happens in unregister_kernel_hw_breakpoint().
> > 
> 
> First, on_each_cpu() will now be changed to return only after all
> functions invoked through IPIs have returned (by changing @wait
> parameter to 1). This is required to prevent side effects of
> incrementing hbp_kernel_pos after on_each_cpu() in
> unregister_kernel_hw_breakpoint() [hbp_kernel_pos is still incremented
> after IPI and I will explain it below].

Good.  I thought this was already done; it's lucky you noticed it
wasn't.

> on_each_cpu() isn't a blocking call (despite @wait being set to 1, which
> does a busy wait through cpu_relax()) and should be safe to invoke
> inside a spin_lock() context. I would like to know if you think
> otherwise.

I guess you're right.  Why did you change the spin_lock to 
spin_lock_bh?


> > What happens if a kernel breakpoint is triggered on another CPU while
> > this loop is running?  Or what happens if the breakpoint being removed
> > is triggered on another CPU before on_each_cpu() is called below?
> > 
> > Basically, it's impossible to change the kernel breakpoints 
> > simultaneously on all CPUs.  That means you somehow have to keep both 
> > the old set and the new set around until all the CPUs are updated.
> > 
> 
> I must admit that the code did not handle the above scenario. I am
> adding a per-cpu instance of 'hbp_kernel[]' called 'this_hbp_kernel[]'.
> The breakpoint handler would use the per-cpu instance which will remain
> valid throughout the execution of the handler. The per-cpu instance will
> be updated with hbp_kernel[] values in arch_update_kernel_hw_breakpoint().
> [This necessitates hbp_kernel_pos increment to happen after the IPI call
> in unregister_kernel code].

Yes, okay.  I'm a little nervous about making this_hbp_kernel[] be
per-cpu while hbp_kernel_pos isn't.  It means the two values will
sometimes be out of sync with each other.  But it ought to be safe
since during the out-of-sync periods, hbp_kernel_pos will always point
to an empty breakpoint slot.

Still, in hw_breakpoint_handler(), perhaps the "if (i >= 
hbp_kernel_pos)" path should also check for !bp.  Just to be safe.  In 
other words, move the 

+			if (!bp)
+				continue;
+			rc = NOTIFY_DONE;

lines outside the "if" statement.


> > Another problem: kdr7 is a global variable, and here you've got every
> > CPU recomputing it whenever a kernel breakpoint is added or removed.
> > It should be computed just once, before the on_each_cpu() call.
> > 
> 
> If kdr7 needs to be updated only once, it has to be kept outside the IPI
> through the use of a wrapper routine (in arch/x86/kernel/hw_breakpoint.c
> as it is arch-specific). This would mean one more function call in
> (un)register_kernel_<> routines taking the code back to one of its previous
> designs. In a trade-off between code-brevity and efficiency, the present one
> chose the former keeping in mind some of the comments received during the
> early stages of this patch.

You don't appreciate the seriousness of this problem.  Here's the new code:

+	/* Clear all kernel-space breakpoints in kdr7 */
+	kdr7 = 0;
(A)
+	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
+		per_cpu(this_hbp_kernel[i], cpu) = bp = hbp_kernel[i];
+		if (bp) {
+			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
+			set_debugreg(hbp_kernel[i]->info.address, i);
+		}
+	}
+
+	/* No need to set DR6. Update the debug registers with kernel-space
+	 * breakpoint values from kdr7 and user-space requests from the
+	 * current process
+	 */
(B)
+	set_debugreg(kdr7 | current->thread.debugreg7, 7);

Suppose you send out the IPIs, and one of the CPUs happens to reach (A)
at the same time another CPU reaches (B).  The second CPU will load an
incorrect value into DR7.

If you prefer you can replace kdr7 above with a local variable, and 
set kdr7 to the local variable's value at the end of the function.  
Also, in the first set_debugreg() line above, you can replace 
hbp_kernel[i] with bp.


> > > +	void *temp_mem = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> > > +	if (!temp_mem)
> > > +		temp_mem_used = -ENOMEM;
> > 
> > I don't think this is a good idea...
> > 
> 
> I agree that it turned out to be wrong. ptrace is now modified to use
> the (un)register_user_hw_breakpoint() interfaces directly and not the
> worker routines, thereby avoiding all this complexity. Please find the
> changes in the new patch.

> > And now if something went wrong, you have already freed the memory
> > holding the original breakpoint structures.  It would be better to
> > keep them around until you know they aren't going to be needed.

I still think it would be a good idea to avoid freeing any breakpoint
structures until you know for certain they aren't needed.  Otherwise
you might find that you're unable to allocate memory to re-create a
structure that was already present.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-11 14:50     ` Alan Stern
@ 2009-05-11 17:20       ` K.Prasad
  2009-05-11 18:09         ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-11 17:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Mon, May 11, 2009 at 10:50:39AM -0400, Alan Stern wrote:
> On Mon, 11 May 2009, K.Prasad wrote:
> 
> > > You shouldn't call on_each_cpu() while holding a spinlock.  The same
> > > thing happens in unregister_kernel_hw_breakpoint().
> > > 
> > 
> > First, on_each_cpu() will now be changed to return only after all
> > functions invoked through IPIs have returned (by changing @wait
> > parameter to 1). This is required to prevent side effects of
> > incrementing hbp_kernel_pos after on_each_cpu() in
> > unregister_kernel_hw_breakpoint() [hbp_kernel_pos is still incremented
> > after IPI and I will explain it below].
> 
> Good.  I thought this was already done; it's lucky you noticed it
> wasn't.
> 
> > on_each_cpu() isn't a blocking call (despite @wait being set to 1, which
> > does a busy wait through cpu_relax()) and should be safe to invoke
> > inside a spin_lock() context. I would like to know if you think
> > otherwise.
> 
> I guess you're right.  Why did you change the spin_lock to 
> spin_lock_bh?
> 

I realised that flush_thread_hw_breakpoint()-->flush_thread() is invoked
through a softIRQ. Considering that it can cause potential deadlock due
to circular lock dependancy, spin_lock() calls were changed to
spin_lock_bh(). Here's a traceback and message from lockdep that helped
me:

=================================
[ INFO: inconsistent lock state ]
2.6.30-rc4-tip.hbkpt #31
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (hw_breakpoint_lock){+.?...}, at: [<c0498f73>]
flush_thread_hw_breakpoint+0x16/0x75
{SOFTIRQ-ON-W} state was registered at:
  [<c0459309>] __lock_acquire+0x2e8/0xb55
  [<c0459c11>] lock_acquire+0x9b/0xbe
  [<c0701dbd>] _spin_lock+0x23/0x53
  [<c0499190>] load_debug_registers+0x1b/0x77
  [<c06fcf20>] start_secondary+0x1b9/0x1c6
  [<ffffffff>] 0xffffffff
irq event stamp: 1447500
hardirqs last  enabled at (1447500): [<c04be9ff>]
kmem_cache_free+0xb2/0xe7
hardirqs last disabled at (1447499): [<c04be996>]
kmem_cache_free+0x49/0xe7
softirqs last  enabled at (1447468): [<c043cdda>]
__do_softirq+0x14c/0x154
softirqs last disabled at (1447473): [<c04051b8>] do_softirq+0x6d/0xcd

other info that might help us debug this:
no locks held by swapper/0.

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.30-rc4-tip.hbkpt #31
Call Trace:
 [<c06ff8d2>] ? printk+0x14/0x1a
 [<c04582e7>] valid_state+0x12a/0x13d
 [<c04583e0>] mark_lock+0xe6/0x1e9
 [<c0458ae8>] ? check_usage_forwards+0x0/0x3f
 [<c0459287>] __lock_acquire+0x266/0xb55
 [<c04aa403>] ? page_address+0xe/0xb1
 [<c0459c11>] lock_acquire+0x9b/0xbe
 [<c0498f73>] ? flush_thread_hw_breakpoint+0x16/0x75
 [<c0701dbd>] _spin_lock+0x23/0x53
 [<c0498f73>] ? flush_thread_hw_breakpoint+0x16/0x75
 [<c0498f73>] flush_thread_hw_breakpoint+0x16/0x75
 [<c0409751>] free_thread_xstate+0x40/0x62
 [<c0409785>] free_thread_info+0x12/0x1e
 [<c043611b>] free_task+0x1e/0x34
 [<c0437522>] __put_task_struct+0xa7/0xac
 [<c0439084>] delayed_put_task_struct+0x75/0x7c
 [<c047bd9c>] __rcu_process_callbacks+0x1a5/0x229
 [<c047be44>] rcu_process_callbacks+0x24/0x42
 [<c043cd2b>] __do_softirq+0x9d/0x154
 [<c043cc8e>] ? __do_softirq+0x0/0x154
 <IRQ>  [<c043c816>] ? irq_exit+0x3a/0x68
 [<c07055d4>] ? smp_apic_timer_interrupt+0x79/0x87
 [<c0403d47>] ? apic_timer_interrupt+0x2f/0x34
 [<c041eeb4>] ? native_safe_halt+0xa/0xc
 [<c0409296>] ? default_idle+0x4f/0x81
 [<c048c417>] ? stop_critical_timings+0x25/0x2a
 [<c040263b>] ? cpu_idle+0x58/0x79
 [<c06f0004>] ? rest_init+0x58/0x5a
 [<c08cc805>] ? start_kernel+0x2fc/0x301
 [<c08cc06a>] ? __init_begin+0x6a/0x6f

> 
> > > What happens if a kernel breakpoint is triggered on another CPU while
> > > this loop is running?  Or what happens if the breakpoint being removed
> > > is triggered on another CPU before on_each_cpu() is called below?
> > > 
> > > Basically, it's impossible to change the kernel breakpoints 
> > > simultaneously on all CPUs.  That means you somehow have to keep both 
> > > the old set and the new set around until all the CPUs are updated.
> > > 
> > 
> > I must admit that the code did not handle the above scenario. I am
> > adding a per-cpu instance of 'hbp_kernel[]' called 'this_hbp_kernel[]'.
> > The breakpoint handler would use the per-cpu instance which will remain
> > valid throughout the execution of the handler. The per-cpu instance will
> > be updated with hbp_kernel[] values in arch_update_kernel_hw_breakpoint().
> > [This necessitates hbp_kernel_pos increment to happen after the IPI call
> > in unregister_kernel code].
> 
> Yes, okay.  I'm a little nervous about making this_hbp_kernel[] be
> per-cpu while hbp_kernel_pos isn't.  It means the two values will
> sometimes be out of sync with each other.  But it ought to be safe
> since during the out-of-sync periods, hbp_kernel_pos will always point
> to an empty breakpoint slot.
> 
> Still, in hw_breakpoint_handler(), perhaps the "if (i >= 
> hbp_kernel_pos)" path should also check for !bp.  Just to be safe.  In 
> other words, move the 
> 
> +			if (!bp)
> +				continue;
> +			rc = NOTIFY_DONE;
> 
> lines outside the "if" statement.
> 
> 

I don't see where the out-of-sync situation between hbp_kernel_pos and
this_hbp_kernel[] can cause problem. Our concern is when an exception
arises in the small time-window starting at the time when hbp_kernel[]
is updated in common code and ends when exceptions are disabled through
the IPI routine arch_update_kernel_hw_breakpoint. Consider these two
cases now:

i)register_kernel_hw_breakpoint() - Although hbp_kernel_pos is
decremented before the IPI, no new exceptions will arise because of the
same on a CPU where this_hbp_kernel[] is not synced because the physical
debug registers are not yet set. So, the 'i' in "if (i >= hbp_kernel_pos)"
cannot belong to the newly registered breakpoint. If it is for the new
breakpoint, then it is on a CPU where this_hbp_kernel[] is synced with
hbp_kernel[].

ii)unregister_kernel_hw_breakpoint() - hbp_kernel_pos is incremented
after all the IPIs have finished, so hbp_kernel_pos still points to the
old value whose hbp_kernel[] is NULL. However in "if (i >= hbp_kernel_pos)"
if 'i' points to the 'pending-delete' breakpoint, then 'bp' is derived
from this_hbp_kernel[] which contains the old value. The callback
bp->triggered() is invoked and the exception returns fine.

Let me know if I am missing any possibility leading to incorrect
behaviour. 

> > > Another problem: kdr7 is a global variable, and here you've got every
> > > CPU recomputing it whenever a kernel breakpoint is added or removed.
> > > It should be computed just once, before the on_each_cpu() call.
> > > 
> > 
> > If kdr7 needs to be updated only once, it has to be kept outside the IPI
> > through the use of a wrapper routine (in arch/x86/kernel/hw_breakpoint.c
> > as it is arch-specific). This would mean one more function call in
> > (un)register_kernel_<> routines taking the code back to one of its previous
> > designs. In a trade-off between code-brevity and efficiency, the present one
> > chose the former keeping in mind some of the comments received during the
> > early stages of this patch.
> 
> You don't appreciate the seriousness of this problem.  Here's the new code:
> 
> +	/* Clear all kernel-space breakpoints in kdr7 */
> +	kdr7 = 0;
> (A)
> +	for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> +		per_cpu(this_hbp_kernel[i], cpu) = bp = hbp_kernel[i];
> +		if (bp) {
> +			kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
> +			set_debugreg(hbp_kernel[i]->info.address, i);
> +		}
> +	}
> +
> +	/* No need to set DR6. Update the debug registers with kernel-space
> +	 * breakpoint values from kdr7 and user-space requests from the
> +	 * current process
> +	 */
> (B)
> +	set_debugreg(kdr7 | current->thread.debugreg7, 7);
> 
> Suppose you send out the IPIs, and one of the CPUs happens to reach (A)
> at the same time another CPU reaches (B).  The second CPU will load an
> incorrect value into DR7.
> 
> If you prefer you can replace kdr7 above with a local variable, and 
> set kdr7 to the local variable's value at the end of the function.  
> Also, in the first set_debugreg() line above, you can replace 
> hbp_kernel[i] with bp.
> 
>

I failed to realise the potential inconsistency in kdr7 value. The code
will be modified like this:

void arch_update_kernel_hw_breakpoints(void *unused)
{
        struct hw_breakpoint *bp;
        int i, cpu = get_cpu();
        unsigned long temp_kdr7 = 0;

        /* Don't allow debug exceptions while we update the registers */
        set_debugreg(0UL, 7);

        for (i = hbp_kernel_pos; i < HB_NUM; i++) {
                per_cpu(this_hbp_kernel[i], cpu) = bp = hbp_kernel[i];
                if (bp) {
                        temp_kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
                        set_debugreg(bp->info.address, i);
                }
        }

        /* No need to set DR6. Update the debug registers with kernel-space
         * breakpoint values from temp_kdr7 and user-space requests from
         * the current process
         */
        kdr7 = temp_kdr7;
        set_debugreg(kdr7 | current->thread.debugreg7, 7);
        put_cpu_no_resched();
}

 
> > > > +	void *temp_mem = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> > > > +	if (!temp_mem)
> > > > +		temp_mem_used = -ENOMEM;
> > > 
> > > I don't think this is a good idea...
> > > 
> > 
> > I agree that it turned out to be wrong. ptrace is now modified to use
> > the (un)register_user_hw_breakpoint() interfaces directly and not the
> > worker routines, thereby avoiding all this complexity. Please find the
> > changes in the new patch.
> 
> > > And now if something went wrong, you have already freed the memory
> > > holding the original breakpoint structures.  It would be better to
> > > keep them around until you know they aren't going to be needed.
> 
> I still think it would be a good idea to avoid freeing any breakpoint
> structures until you know for certain they aren't needed.  Otherwise
> you might find that you're unable to allocate memory to re-create a
> structure that was already present.
> 
> Alan Stern
>

I agree that we shouldn't free memory temporarily that would be needed
later, as memory may not be available. However such a situation does not
arise in ptrace_write_dr7() after the re-write using
(un)register_user_hw_breakpoint(). kfree(bp) is done when
i) an existing breakpoint is disabled
ii) a breakpoint is requested but register_user_hw_breakpoint() routine
fails for some reasons. Hence the breakpoint structure kzalloc()'ed
afresh is kfree()'ed.
iii) In case of modifications to the type of breakpoint (such as
read<-->write), the breakpoint structure is not de-allocated and this is
where __modify_user_hw_breakpoint() helps us. There is no opportunity
window where the breakpoint can be grabbed by other requests when a
modification is done.

Let me know if you think there's still some discrepancy here.

Thanks for your detailed review and pointing me to the errors. I am
sending the patchset again with the changes discussed above.

Thanks,
K.Prasad

 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-11 17:20       ` K.Prasad
@ 2009-05-11 18:09         ` Alan Stern
  2009-05-12 14:05           ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-05-11 18:09 UTC (permalink / raw)
  To: K.Prasad
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Mon, 11 May 2009, K.Prasad wrote:

> > Still, in hw_breakpoint_handler(), perhaps the "if (i >= 
> > hbp_kernel_pos)" path should also check for !bp.  Just to be safe.  In 
> > other words, move the 
> > 
> > +			if (!bp)
> > +				continue;
> > +			rc = NOTIFY_DONE;
> > 
> > lines outside the "if" statement.
> > 
> > 
> 
> I don't see where the out-of-sync situation between hbp_kernel_pos and
> this_hbp_kernel[] can cause problem. Our concern is when an exception
> arises in the small time-window starting at the time when hbp_kernel[]
> is updated in common code and ends when exceptions are disabled through
> the IPI routine arch_update_kernel_hw_breakpoint. Consider these two
> cases now:
> 
> i)register_kernel_hw_breakpoint() - Although hbp_kernel_pos is
> decremented before the IPI, no new exceptions will arise because of the
> same on a CPU where this_hbp_kernel[] is not synced because the physical
> debug registers are not yet set. So, the 'i' in "if (i >= hbp_kernel_pos)"
> cannot belong to the newly registered breakpoint. If it is for the new
> breakpoint, then it is on a CPU where this_hbp_kernel[] is synced with
> hbp_kernel[].
> 
> ii)unregister_kernel_hw_breakpoint() - hbp_kernel_pos is incremented
> after all the IPIs have finished, so hbp_kernel_pos still points to the
> old value whose hbp_kernel[] is NULL. However in "if (i >= hbp_kernel_pos)"
> if 'i' points to the 'pending-delete' breakpoint, then 'bp' is derived
> from this_hbp_kernel[] which contains the old value. The callback
> bp->triggered() is invoked and the exception returns fine.
> 
> Let me know if I am missing any possibility leading to incorrect
> behaviour. 

Your analysis is correct, but to me it feels a little fragile.  Future
changes to the code might cause an unexpected interaction.  Moving that
test won't hurt anything and it will make the handler slightly more
robust.


> > I still think it would be a good idea to avoid freeing any breakpoint
> > structures until you know for certain they aren't needed.  Otherwise
> > you might find that you're unable to allocate memory to re-create a
> > structure that was already present.
> > 
> > Alan Stern
> >
> 
> I agree that we shouldn't free memory temporarily that would be needed
> later, as memory may not be available. However such a situation does not
> arise in ptrace_write_dr7() after the re-write using
> (un)register_user_hw_breakpoint(). kfree(bp) is done when
> i) an existing breakpoint is disabled
> ii) a breakpoint is requested but register_user_hw_breakpoint() routine
> fails for some reasons. Hence the breakpoint structure kzalloc()'ed
> afresh is kfree()'ed.
> iii) In case of modifications to the type of breakpoint (such as
> read<-->write), the breakpoint structure is not de-allocated and this is
> where __modify_user_hw_breakpoint() helps us. There is no opportunity
> window where the breakpoint can be grabbed by other requests when a
> modification is done.
> 
> Let me know if you think there's still some discrepancy here.

Okay.  Let's suppose a single userspace breakpoint has already been
allocated and set up as breakpoint 0.  Now another ptrace call comes
along, in which bp 0 is disabled and bp 1 is requested.  This is your
case (i); the breakpoint structure for bp 0 will be deallocated.  But
maybe something goes wrong with the register_user_hw_breakpoint() call
for bp 1, so we have to restore the old settings.  The code will try to
allocate a new breakpoint structure to hold bp 0; what happens if
that allocation fails?  The user program will know that the write to
DR7 didn't succeed, so it will think the pre-existing breakpoint is
still valid.  But it isn't.

If instead you had avoided deallocated the structure for bp 0 until the 
end, then there would be no need to reallocate it during the restore 
and so the restore would succeed.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-11 18:09         ` Alan Stern
@ 2009-05-12 14:05           ` K.Prasad
  2009-05-12 14:55             ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-12 14:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Mon, May 11, 2009 at 02:09:48PM -0400, Alan Stern wrote:
> On Mon, 11 May 2009, K.Prasad wrote:
> 
> > > Still, in hw_breakpoint_handler(), perhaps the "if (i >= 
> > > hbp_kernel_pos)" path should also check for !bp.  Just to be safe.  In 
> > > other words, move the 
> > > 
> > > +			if (!bp)
> > > +				continue;
> > > +			rc = NOTIFY_DONE;
> > > 
> > > lines outside the "if" statement.
> > > 
> > > 
> > 
> > I don't see where the out-of-sync situation between hbp_kernel_pos and
> > this_hbp_kernel[] can cause problem. Our concern is when an exception
> > arises in the small time-window starting at the time when hbp_kernel[]
> > is updated in common code and ends when exceptions are disabled through
> > the IPI routine arch_update_kernel_hw_breakpoint. Consider these two
> > cases now:
> > 
> > i)register_kernel_hw_breakpoint() - Although hbp_kernel_pos is
> > decremented before the IPI, no new exceptions will arise because of the
> > same on a CPU where this_hbp_kernel[] is not synced because the physical
> > debug registers are not yet set. So, the 'i' in "if (i >= hbp_kernel_pos)"
> > cannot belong to the newly registered breakpoint. If it is for the new
> > breakpoint, then it is on a CPU where this_hbp_kernel[] is synced with
> > hbp_kernel[].
> > 
> > ii)unregister_kernel_hw_breakpoint() - hbp_kernel_pos is incremented
> > after all the IPIs have finished, so hbp_kernel_pos still points to the
> > old value whose hbp_kernel[] is NULL. However in "if (i >= hbp_kernel_pos)"
> > if 'i' points to the 'pending-delete' breakpoint, then 'bp' is derived
> > from this_hbp_kernel[] which contains the old value. The callback
> > bp->triggered() is invoked and the exception returns fine.
> > 
> > Let me know if I am missing any possibility leading to incorrect
> > behaviour. 
> 
> Your analysis is correct, but to me it feels a little fragile.  Future
> changes to the code might cause an unexpected interaction.  Moving that
> test won't hurt anything and it will make the handler slightly more
> robust.
> 
>

I agree it keeps the code much safer. Here's the relevant code-snippet:

		if (i >= hbp_kernel_pos)
			bp = per_cpu(this_hbp_kernel[i], cpu);
		else {
			bp = current->thread.hbp[i];
			rc = NOTIFY_DONE;
		}
		/*
		 * bp can be NULL due to lazy debug register switching
		 * for
		 * user-space requests or the exception was triggered in
		 * the
		 * mismatch window when 'hbp_kernel_pos' and
		 * 'this_hbp_kernel[]'
		 * are out-of-sync
		 */
		if (!bp)
			continue;

> > > I still think it would be a good idea to avoid freeing any breakpoint
> > > structures until you know for certain they aren't needed.  Otherwise
> > > you might find that you're unable to allocate memory to re-create a
> > > structure that was already present.
> > > 
> > > Alan Stern
> > >
> > 
> > I agree that we shouldn't free memory temporarily that would be needed
> > later, as memory may not be available. However such a situation does not
> > arise in ptrace_write_dr7() after the re-write using
> > (un)register_user_hw_breakpoint(). kfree(bp) is done when
> > i) an existing breakpoint is disabled
> > ii) a breakpoint is requested but register_user_hw_breakpoint() routine
> > fails for some reasons. Hence the breakpoint structure kzalloc()'ed
> > afresh is kfree()'ed.
> > iii) In case of modifications to the type of breakpoint (such as
> > read<-->write), the breakpoint structure is not de-allocated and this is
> > where __modify_user_hw_breakpoint() helps us. There is no opportunity
> > window where the breakpoint can be grabbed by other requests when a
> > modification is done.
> > 
> > Let me know if you think there's still some discrepancy here.
> 
> Okay.  Let's suppose a single userspace breakpoint has already been
> allocated and set up as breakpoint 0.  Now another ptrace call comes
> along, in which bp 0 is disabled and bp 1 is requested.  This is your
> case (i); the breakpoint structure for bp 0 will be deallocated.  But
> maybe something goes wrong with the register_user_hw_breakpoint() call
> for bp 1, so we have to restore the old settings.  The code will try to
> allocate a new breakpoint structure to hold bp 0; what happens if
> that allocation fails?  The user program will know that the write to
> DR7 didn't succeed, so it will think the pre-existing breakpoint is
> still valid.  But it isn't.
> 
> If instead you had avoided deallocated the structure for bp 0 until the 
> end, then there would be no need to reallocate it during the restore 
> and so the restore would succeed.
> 
> Alan Stern

Thanks for explaining the case so well, I did not foresee it! The
ptrace_write_dr7() is modified to perform changes in two-passes. The first
one will effect all register/modify operations while the second one will
unregister. One issue that I see in this method is if/when virtual debug
registers are implemented, we may incorrectly deny a register request
for want of physical debug registers (because they have not been free-ed
through an earlier unregister operation). Let me know if you think it
can be done better. Here's the new ptrace_write_dr7():


static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
{
	struct thread_struct *thread = &(tsk->thread);
	unsigned long old_dr7 = thread->debugreg7;
	int i, rc = 0;
	int enabled, second_pass_reqd = 1;
	unsigned len, type;
	struct hw_breakpoint *bp;

	data &= ~DR_CONTROL_RESERVED;
restore:
	/*
	 * Loop through all the hardware breakpoints, making the
	 * appropriate changes to each.
	 */
	for (i = 0; i < HB_NUM; i++) {
		enabled = decode_dr7(data, i, &len, &type);
		bp = thread->hbp[i];

		if (!enabled) {
			if (bp) {
				/* Don't unregister the breakpoints right-away,
				 * unless all register_user_hw_breakpoint()
				 * requests have succeeded. This prevents
				 * any window of opportunity for debug
				 * register grabbing by other users.
				 */
				if (second_pass_reqd)
					continue;
				unregister_user_hw_breakpoint(tsk, bp);
				kfree(bp);
			}
			continue;
		}

		if (!bp) {
			rc = -ENOMEM;
			bp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
			if (bp) {
				bp->info.address = thread->debugreg[i];
				bp->triggered = ptrace_triggered;
				bp->info.len = len;
				bp->info.type = type;
				rc = register_user_hw_breakpoint(tsk, bp);
				if (!rc)
					set_tsk_thread_flag(tsk, TIF_DEBUG);
				else
					kfree(bp);
			}
		} else {
			spin_lock_bh(&hw_breakpoint_lock);
			rc = __modify_user_hw_breakpoint(i, tsk, bp);
			spin_unlock_bh(&hw_breakpoint_lock);
		}

		if (rc)
			break;
	}

	/* If anything above failed, restore the original settings */
	if (rc < 0) {
		data = old_dr7;
		second_pass_reqd = 0;
		goto restore;
	}
	if (second_pass_reqd) {
		/* Enter the second-pass after disabling 'second_pass_reqd' */
		second_pass_reqd = 0;
		goto restore;
	}
	return rc;
}

I will repost the patchset along with the changes explained above.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-12 14:05           ` K.Prasad
@ 2009-05-12 14:55             ` Alan Stern
  2009-05-12 17:12               ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-05-12 14:55 UTC (permalink / raw)
  To: K.Prasad
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Tue, 12 May 2009, K.Prasad wrote:

> I agree it keeps the code much safer. Here's the relevant code-snippet:
> 
> 		if (i >= hbp_kernel_pos)
> 			bp = per_cpu(this_hbp_kernel[i], cpu);
> 		else {
> 			bp = current->thread.hbp[i];
> 			rc = NOTIFY_DONE;
> 		}
> 		/*
> 		 * bp can be NULL due to lazy debug register switching
> 		 * for
> 		 * user-space requests or the exception was triggered in
> 		 * the
> 		 * mismatch window when 'hbp_kernel_pos' and
> 		 * 'this_hbp_kernel[]'
> 		 * are out-of-sync
> 		 */

The comment could be a little more grammatical and more succinct.  For 
example:

		/*
		 * bp can be NULL due to lazy debug register switching
		 * or to the delay between updates of hbp_kernel_pos 
		 * and this_hbp_kernel.
		 */

> > Okay.  Let's suppose a single userspace breakpoint has already been
> > allocated and set up as breakpoint 0.  Now another ptrace call comes
> > along, in which bp 0 is disabled and bp 1 is requested.  This is your
> > case (i); the breakpoint structure for bp 0 will be deallocated.  But
> > maybe something goes wrong with the register_user_hw_breakpoint() call
> > for bp 1, so we have to restore the old settings.  The code will try to
> > allocate a new breakpoint structure to hold bp 0; what happens if
> > that allocation fails?  The user program will know that the write to
> > DR7 didn't succeed, so it will think the pre-existing breakpoint is
> > still valid.  But it isn't.
> > 
> > If instead you had avoided deallocated the structure for bp 0 until the 
> > end, then there would be no need to reallocate it during the restore 
> > and so the restore would succeed.
> > 
> > Alan Stern
> 
> Thanks for explaining the case so well, I did not foresee it! The
> ptrace_write_dr7() is modified to perform changes in two-passes. The first
> one will effect all register/modify operations while the second one will
> unregister. One issue that I see in this method is if/when virtual debug
> registers are implemented, we may incorrectly deny a register request
> for want of physical debug registers (because they have not been free-ed
> through an earlier unregister operation). Let me know if you think it
> can be done better. Here's the new ptrace_write_dr7():
> 
> 
> static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
> {
> 	struct thread_struct *thread = &(tsk->thread);
> 	unsigned long old_dr7 = thread->debugreg7;
> 	int i, rc = 0;
> 	int enabled, second_pass_reqd = 1;
> 	unsigned len, type;
> 	struct hw_breakpoint *bp;
> 
> 	data &= ~DR_CONTROL_RESERVED;
> restore:
> 	/*
> 	 * Loop through all the hardware breakpoints, making the
> 	 * appropriate changes to each.
> 	 */
> 	for (i = 0; i < HB_NUM; i++) {
> 		enabled = decode_dr7(data, i, &len, &type);
> 		bp = thread->hbp[i];
> 
> 		if (!enabled) {
> 			if (bp) {
> 				/* Don't unregister the breakpoints right-away,
> 				 * unless all register_user_hw_breakpoint()
> 				 * requests have succeeded. This prevents
> 				 * any window of opportunity for debug
> 				 * register grabbing by other users.
> 				 */
> 				if (second_pass_reqd)
> 					continue;
> 				unregister_user_hw_breakpoint(tsk, bp);
> 				kfree(bp);
> 			}
> 			continue;
> 		}
> 
> 		if (!bp) {
> 			rc = -ENOMEM;
> 			bp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> 			if (bp) {
> 				bp->info.address = thread->debugreg[i];
> 				bp->triggered = ptrace_triggered;
> 				bp->info.len = len;
> 				bp->info.type = type;
> 				rc = register_user_hw_breakpoint(tsk, bp);
> 				if (!rc)
> 					set_tsk_thread_flag(tsk, TIF_DEBUG);
> 				else
> 					kfree(bp);
> 			}
> 		} else {
> 			spin_lock_bh(&hw_breakpoint_lock);
> 			rc = __modify_user_hw_breakpoint(i, tsk, bp);
> 			spin_unlock_bh(&hw_breakpoint_lock);

With these changes, __register_user_hw_breakpoint() and 
__unregister_user_hw_breakpoint() no longer need to be separate 
routines.  Similarly, __modify_user_hw_breakpoint() can be renamed 
without the "__" prefix, and it can acquire the spinlock instead of 
making the caller do the locking.

Then hw_breakpoint_lock can be made static.  There's no need to lock it 
in ptrace_get_debugreg() any more.

> 		}
> 
> 		if (rc)
> 			break;
> 	}
> 
> 	/* If anything above failed, restore the original settings */
> 	if (rc < 0) {
> 		data = old_dr7;
> 		second_pass_reqd = 0;
> 		goto restore;
> 	}
> 	if (second_pass_reqd) {
> 		/* Enter the second-pass after disabling 'second_pass_reqd' */
> 		second_pass_reqd = 0;
> 		goto restore;
> 	}
> 	return rc;
> }
> 
> I will repost the patchset along with the changes explained above.

I think it would be less confusing if you change the name
"second_pass_reqd" to just "second_pass", setting it to 0 the first
time through and 1 the second time.  So at the end:

	/*
	 * Make a second pass to free the remaining unused breakpoints
	 * or to restore the original breakpoints if an error occurred.
	 */
	if (!second_pass) {
		second_pass = 1;
		if (rc < 0)
			data = old_dr7;
		goto restore;
	}

Also, you need to be more careful about the value of rc.  The error 
code you return should be the error that caused the first pass to fail.  
Errors or successes during the second pass shouldn't overwrite that 
value.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-12 14:55             ` Alan Stern
@ 2009-05-12 17:12               ` K.Prasad
  2009-05-12 20:39                 ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-12 17:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Tue, May 12, 2009 at 10:55:10AM -0400, Alan Stern wrote:
> On Tue, 12 May 2009, K.Prasad wrote:
> 
> > I agree it keeps the code much safer. Here's the relevant code-snippet:
> > 
> > 		if (i >= hbp_kernel_pos)
> > 			bp = per_cpu(this_hbp_kernel[i], cpu);
> > 		else {
> > 			bp = current->thread.hbp[i];
> > 			rc = NOTIFY_DONE;
> > 		}
> > 		/*
> > 		 * bp can be NULL due to lazy debug register switching
> > 		 * for
> > 		 * user-space requests or the exception was triggered in
> > 		 * the
> > 		 * mismatch window when 'hbp_kernel_pos' and
> > 		 * 'this_hbp_kernel[]'
> > 		 * are out-of-sync
> > 		 */
> 
> The comment could be a little more grammatical and more succinct.  For 
> example:
> 
> 		/*
> 		 * bp can be NULL due to lazy debug register switching
> 		 * or to the delay between updates of hbp_kernel_pos 
> 		 * and this_hbp_kernel.
> 		 */
> 

Ok.

> > > Okay.  Let's suppose a single userspace breakpoint has already been
> > > allocated and set up as breakpoint 0.  Now another ptrace call comes
> > > along, in which bp 0 is disabled and bp 1 is requested.  This is your
> > > case (i); the breakpoint structure for bp 0 will be deallocated.  But
> > > maybe something goes wrong with the register_user_hw_breakpoint() call
> > > for bp 1, so we have to restore the old settings.  The code will try to
> > > allocate a new breakpoint structure to hold bp 0; what happens if
> > > that allocation fails?  The user program will know that the write to
> > > DR7 didn't succeed, so it will think the pre-existing breakpoint is
> > > still valid.  But it isn't.
> > > 
> > > If instead you had avoided deallocated the structure for bp 0 until the 
> > > end, then there would be no need to reallocate it during the restore 
> > > and so the restore would succeed.
> > > 
> > > Alan Stern
> > 
> > Thanks for explaining the case so well, I did not foresee it! The
> > ptrace_write_dr7() is modified to perform changes in two-passes. The first
> > one will effect all register/modify operations while the second one will
> > unregister. One issue that I see in this method is if/when virtual debug
> > registers are implemented, we may incorrectly deny a register request
> > for want of physical debug registers (because they have not been free-ed
> > through an earlier unregister operation). Let me know if you think it
> > can be done better. Here's the new ptrace_write_dr7():
> > 
> > 
> > static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
> > {
> > 	struct thread_struct *thread = &(tsk->thread);
> > 	unsigned long old_dr7 = thread->debugreg7;
> > 	int i, rc = 0;
> > 	int enabled, second_pass_reqd = 1;
> > 	unsigned len, type;
> > 	struct hw_breakpoint *bp;
> > 
> > 	data &= ~DR_CONTROL_RESERVED;
> > restore:
> > 	/*
> > 	 * Loop through all the hardware breakpoints, making the
> > 	 * appropriate changes to each.
> > 	 */
> > 	for (i = 0; i < HB_NUM; i++) {
> > 		enabled = decode_dr7(data, i, &len, &type);
> > 		bp = thread->hbp[i];
> > 
> > 		if (!enabled) {
> > 			if (bp) {
> > 				/* Don't unregister the breakpoints right-away,
> > 				 * unless all register_user_hw_breakpoint()
> > 				 * requests have succeeded. This prevents
> > 				 * any window of opportunity for debug
> > 				 * register grabbing by other users.
> > 				 */
> > 				if (second_pass_reqd)
> > 					continue;
> > 				unregister_user_hw_breakpoint(tsk, bp);
> > 				kfree(bp);
> > 			}
> > 			continue;
> > 		}
> > 
> > 		if (!bp) {
> > 			rc = -ENOMEM;
> > 			bp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
> > 			if (bp) {
> > 				bp->info.address = thread->debugreg[i];
> > 				bp->triggered = ptrace_triggered;
> > 				bp->info.len = len;
> > 				bp->info.type = type;
> > 				rc = register_user_hw_breakpoint(tsk, bp);
> > 				if (!rc)
> > 					set_tsk_thread_flag(tsk, TIF_DEBUG);
> > 				else
> > 					kfree(bp);
> > 			}
> > 		} else {
> > 			spin_lock_bh(&hw_breakpoint_lock);
> > 			rc = __modify_user_hw_breakpoint(i, tsk, bp);
> > 			spin_unlock_bh(&hw_breakpoint_lock);
> 
> With these changes, __register_user_hw_breakpoint() and 
> __unregister_user_hw_breakpoint() no longer need to be separate 
> routines.  Similarly, __modify_user_hw_breakpoint() can be renamed 
> without the "__" prefix, and it can acquire the spinlock instead of 
> making the caller do the locking.
>
> Then hw_breakpoint_lock can be made static.  There's no need to lock it 
> in ptrace_get_debugreg() any more.
> 

hmmh...The (un)register_user_<> and __(un)register_user_<> break-up
looks so modular and am not sure if combining them would be of any
benefit (and as you might be aware, the compiler can always choose
to inline though). And doing this might not be helpful if we like to
virtualise the debug registers (as in the original implementation) in
the future.

The __modify_user_hw_breakpoint() had the "__" prefix because it had
"int pos" as a parameter, although it can be eliminated by adding a loop
that iterates through the breakpoints of "tsk". If you prefer
modify_user_hw_breakpoint(), I will introduce the same in
kernel/hw_breakpoint.c which acts as a wrapper over __modify_user_<> and
takes the hw_breakpoint_lock.

Let me know what you think on the above.

> > 		}
> > 
> > 		if (rc)
> > 			break;
> > 	}
> > 
> > 	/* If anything above failed, restore the original settings */
> > 	if (rc < 0) {
> > 		data = old_dr7;
> > 		second_pass_reqd = 0;
> > 		goto restore;
> > 	}
> > 	if (second_pass_reqd) {
> > 		/* Enter the second-pass after disabling 'second_pass_reqd' */
> > 		second_pass_reqd = 0;
> > 		goto restore;
> > 	}
> > 	return rc;
> > }
> > 
> > I will repost the patchset along with the changes explained above.
> 
> I think it would be less confusing if you change the name
> "second_pass_reqd" to just "second_pass", setting it to 0 the first
> time through and 1 the second time.  So at the end:
> 
> 	/*
> 	 * Make a second pass to free the remaining unused breakpoints
> 	 * or to restore the original breakpoints if an error occurred.
> 	 */
> 	if (!second_pass) {
> 		second_pass = 1;
> 		if (rc < 0)
> 			data = old_dr7;
> 		goto restore;
> 	}
> 
> Also, you need to be more careful about the value of rc.  The error 
> code you return should be the error that caused the first pass to fail.  
> Errors or successes during the second pass shouldn't overwrite that 
> value.
> 
> Alan Stern
> 

Ok. Will change the code as suggested. The return values in the above
implementation will be modified in the second pass here:

rc = __modify_user_hw_breakpoint(i, tsk, bp);

I will save the 'rc' value from the first pass in a separate variable
and use that as the return value. Operations in the second pass (such as
unregister/modify are not expected to fail).

Here's the revised code:

/*
 * Handle ptrace writes to debug register 7.
 */
static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
{
	struct thread_struct *thread = &(tsk->thread);
	unsigned long old_dr7 = thread->debugreg7;
	int i, orig_ret, rc = 0;
	int enabled, second_pass = 0;
	unsigned len, type;
	struct hw_breakpoint *bp;

	data &= ~DR_CONTROL_RESERVED;
restore:
	/*
	 * Loop through all the hardware breakpoints, making the
	 * appropriate changes to each.
	 */
	for (i = 0; i < HB_NUM; i++) {
		enabled = decode_dr7(data, i, &len, &type);
		bp = thread->hbp[i];

		if (!enabled) {
			if (bp) {
				/* Don't unregister the breakpoints right-away,
				 * unless all register_user_hw_breakpoint()
				 * requests have succeeded. This prevents
				 * any window of opportunity for debug
				 * register grabbing by other users.
				 */
				if (!second_pass)
					continue;
				unregister_user_hw_breakpoint(tsk, bp);
				kfree(bp);
			}
			continue;
		}
		if (!bp) {
			rc = -ENOMEM;
			bp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
			if (bp) {
				bp->info.address = thread->debugreg[i];
				bp->triggered = ptrace_triggered;
				bp->info.len = len;
				bp->info.type = type;
				rc = register_user_hw_breakpoint(tsk, bp);
				if (!rc)
					set_tsk_thread_flag(tsk, TIF_DEBUG);
				else
					kfree(bp);
			}
		} else {
			spin_lock_bh(&hw_breakpoint_lock);
			rc = __modify_user_hw_breakpoint(i, tsk, bp);
			spin_unlock_bh(&hw_breakpoint_lock);
		}
		if (rc)
			break;
	}
	/*
	 * Make a second pass to free the remaining unused breakpoints
	 * or to restore the original breakpoints if an error occurred.
	 */
	if (!second_pass) {
		second_pass = 1;
		if (rc < 0) {
			orig_ret = rc;
			data = old_dr7;
		}
		goto restore;
	}
	return ((orig_ret < 0) ? orig_ret : rc); 
}

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint Interfaces
  2009-05-12 17:12               ` K.Prasad
@ 2009-05-12 20:39                 ` Alan Stern
  0 siblings, 0 replies; 20+ messages in thread
From: Alan Stern @ 2009-05-12 20:39 UTC (permalink / raw)
  To: K.Prasad
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath

On Tue, 12 May 2009, K.Prasad wrote:

> hmmh...The (un)register_user_<> and __(un)register_user_<> break-up
> looks so modular and am not sure if combining them would be of any
> benefit (and as you might be aware, the compiler can always choose
> to inline though). And doing this might not be helpful if we like to
> virtualise the debug registers (as in the original implementation) in
> the future.

You can keep them separate if you like, but then at least make the "__" 
routines static and remove them from the .h file.

> The __modify_user_hw_breakpoint() had the "__" prefix because it had
> "int pos" as a parameter, although it can be eliminated by adding a loop
> that iterates through the breakpoints of "tsk". If you prefer
> modify_user_hw_breakpoint(), I will introduce the same in
> kernel/hw_breakpoint.c which acts as a wrapper over __modify_user_<> and
> takes the hw_breakpoint_lock.
> 
> Let me know what you think on the above.

Go ahead and introduce modify_user_hw_breakpoint().  Then the spinlock
can become static.


> Ok. Will change the code as suggested. The return values in the above
> implementation will be modified in the second pass here:
> 
> rc = __modify_user_hw_breakpoint(i, tsk, bp);
> 
> I will save the 'rc' value from the first pass in a separate variable
> and use that as the return value. Operations in the second pass (such as
> unregister/modify are not expected to fail).

That's right.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-06-01 18:12 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-06-01 18:12 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Linux Kernel Mailing List, Ingo Molnar, Alan Stern

Hi Frederic,
	Please find a new patchset that contains a slight re-arrangement in the
patchset as suggested by you here: http://lkml.org/lkml/2009/5/30/165.

The 'struct thread_struct' is modified to contain an array-based storage for
debug register in Patch 01/12 and all users of debugreg<n> are also modified in
the same patch.

Kindly integrate them to be a part of -tip tree.

Thanks,
K.Prasad



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-30 10:47 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-30 10:47 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Linux Kernel Mailing List, Ingo Molnar, Alan Stern

Hi Frederic,
	Please find a new iteration of the patchset that includes the following
changes over the previous version posted here: http://lkml.org/lkml/2009/5/21/143.

Changelog
----------

- Includes a minor change in Patch 8/12 that renames a variable from 'temp'
  to 'bp_info'.

- Updated comment in __switch_to() function to reflect the new method used
  to detect lazy debug register switching (Patch 6/12).

- Moved the patch that removes 'debugreg<n>' members from 'struct thread_struct'
  to Patch 8/12 in order to enable compilation of the kernel tree after
  application of every patch in the patchset.

Kindly integrate the patchset into -tip tree. They are based on commit
0a834a9c61bb279327c50300e43ebb6de7e1e0d7 of the same tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-21 14:00 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-21 14:00 UTC (permalink / raw)
  To: Ingo Molnar, Frederic Weisbecker; +Cc: Linux Kernel Mailing List, Alan Stern

Hi Ingo,
	Please find the patchset that introduces generic Hardware Breakpoint
interfaces and an implementation for x86/x86_64 architecture.

In this iteration, the patches for HW Breakpoint interfaces sent here:
http://lkml.org/lkml/2009/5/15/139 have been combined with patches containing
improvements and minor fixes (posted here: http://lkml.org/lkml/2009/5/19/264)
for easy integration into -tip tree.

Kindly accept the same.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-16  0:23 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-16  0:23 UTC (permalink / raw)
  To: Ingo Molnar, Linux Kernel Mailing List
  Cc: Alan Stern, Frederic Weisbecker, Steven Rostedt, Andrew Morton,
	Benjamin Herrenschmidt, maneesh, Roland McGrath,
	Masami Hiramatsu

Hi Ingo,
	Please find the patchset that introduce generic Hardware Breakpoint
interfaces and an implementation for x86/x86_64 arch. The
"Original-patch-by:" and "Reviewed-by:" fields have been added as
appropriate.

The patches are based on commit cab667c10f6b4a86787a57184f6e18459bbab7b0
of -tip tree. Kindly accept them to be a part of the tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-15 10:53 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-15 10:53 UTC (permalink / raw)
  To: Ingo Molnar, Linux Kernel Mailing List
  Cc: Alan Stern, Frederic Weisbecker, Steven Rostedt, Andrew Morton,
	Benjamin Herrenschmidt, maneesh, Roland McGrath,
	Masami Hiramatsu

Hi Ingo,
        Please find a new patchset that implements generic Hardware Breakpoint
interfaces and an implementation for x86/x86_64 architecture. The patches have
undergone several changes since their last submission to you and is more refined
now.

Kindly accept them to be a part of -tip tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint interfaces
  2009-05-14 20:08   ` K.Prasad
@ 2009-05-14 20:45     ` K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-14 20:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Frederic Weisbecker, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath, Masami Hiramatsu

On Fri, May 15, 2009 at 01:38:29AM +0530, K.Prasad wrote:
> On Thu, May 14, 2009 at 04:02:16PM -0400, Alan Stern wrote:
> > On Wed, 13 May 2009, K.Prasad wrote:
> > 
> > > Hi Alan,
> > > 	Please find a new set of patches with the following changes.
> > > I am eager to know if you think that they are ready for submission towards
> > > upstream acceptance.
> > 
> > This is looking pretty good.  Only one thing caught my eye: 
> > switch_to_thread_hw_breakpoint and switch_to_none_hw_breakpoint do 
> > nothing but call the corresponding arch-specific routines.  You might 
> > as well eliminate them and have the callers invoke the arch-specific 
> > routines instead.
> > 
> > Alan Stern
> >
> 
> Sure. I will eliminate the above two functions and also remove the msleep()
> call in ftrace startup selftest code path.
> 
> With these two changes I plan to submit the code for -tip acceptance
> directly.
>

There is another change that I wanted to make in arch_store_info()
[which I realised while responding to comments on another list]. I think
it will be much safer to add a check to see if symbol_name is passed
when user-space breakpoints are requested and return -EINVAL if done so.

This will help avoid any confusion at the user, who can assume that
symbol names can be resolved, and specifying address is not required.

Here's how the modified arch_store_info() would look like:

/*
 * Store a breakpoint's encoded address, length, and type.
 */
static int arch_store_info(struct hw_breakpoint *bp, struct task_struct *tsk)
{
	/*
	 * User-space requests will always have the address field populated.
	 * Symbol names from user-space are rejected
	 */
	if (tsk && bp->info.name)
		return -EINVAL;
	/*
	 * For kernel-addresses, either the address or symbol name can be
	 * specified.
	 */
	if (bp->info.name)
		bp->info.address = (unsigned long)
					kallsyms_lookup_name(bp->info.name);
	if (bp->info.address)
		return 0;
	return -EINVAL;
}

Thanks,
K.Prasad
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint interfaces
  2009-05-14 20:02 ` Alan Stern
@ 2009-05-14 20:08   ` K.Prasad
  2009-05-14 20:45     ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-14 20:08 UTC (permalink / raw)
  To: Alan Stern
  Cc: Frederic Weisbecker, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath, Masami Hiramatsu

On Thu, May 14, 2009 at 04:02:16PM -0400, Alan Stern wrote:
> On Wed, 13 May 2009, K.Prasad wrote:
> 
> > Hi Alan,
> > 	Please find a new set of patches with the following changes.
> > I am eager to know if you think that they are ready for submission towards
> > upstream acceptance.
> 
> This is looking pretty good.  Only one thing caught my eye: 
> switch_to_thread_hw_breakpoint and switch_to_none_hw_breakpoint do 
> nothing but call the corresponding arch-specific routines.  You might 
> as well eliminate them and have the callers invoke the arch-specific 
> routines instead.
> 
> Alan Stern
>

Sure. I will eliminate the above two functions and also remove the msleep()
call in ftrace startup selftest code path.

With these two changes I plan to submit the code for -tip acceptance
directly.

I wish to thank you profusely for the in-depth code reviews,
suggestions, unearthing corner-case issues and the immense time and effort
taken for the same. The patch wouldn't have been as better but for your
involvement!

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch 00/12] Hardware Breakpoint interfaces
  2009-05-13 16:12 K.Prasad
@ 2009-05-14 20:02 ` Alan Stern
  2009-05-14 20:08   ` K.Prasad
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-05-14 20:02 UTC (permalink / raw)
  To: K.Prasad
  Cc: Frederic Weisbecker, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Andrew Morton, Benjamin Herrenschmidt,
	maneesh, Roland McGrath, Masami Hiramatsu

On Wed, 13 May 2009, K.Prasad wrote:

> Hi Alan,
> 	Please find a new set of patches with the following changes.
> I am eager to know if you think that they are ready for submission towards
> upstream acceptance.

This is looking pretty good.  Only one thing caught my eye: 
switch_to_thread_hw_breakpoint and switch_to_none_hw_breakpoint do 
nothing but call the corresponding arch-specific routines.  You might 
as well eliminate them and have the callers invoke the arch-specific 
routines instead.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-13 16:12 K.Prasad
  2009-05-14 20:02 ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: K.Prasad @ 2009-05-13 16:12 UTC (permalink / raw)
  To: Alan Stern, Frederic Weisbecker
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Andrew Morton, Benjamin Herrenschmidt, maneesh, Roland McGrath,
	Masami Hiramatsu

Hi Alan,
	Please find a new set of patches with the following changes.
I am eager to know if you think that they are ready for submission towards
upstream acceptance.

Hi Frederic,
	Please find the changes in ftrace plugin code that moves the 
startup selftest code completely into trace_selftest_startup_ksym() as
per your suggestion.

Changelog
---------

- A new interface modify_user_hw_breakpoint() to change the characteristics
  (such as type/length) of a pre-registered user-space breakpoint is introduced.

- The startup selftest portion of code from ksym_trace_init() is moved
  completely to trace_selftest_startup_ksym(). process_new_ksym_entry() is
  required during the selftest and is no longer marked static.

- HB_NUM is renamed to HBP_NUM (given that the short naming convention for
  hw_breakpoint is hbp).

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch 00/12] Hardware Breakpoint interfaces
@ 2009-05-11 11:51 K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2009-05-11 11:51 UTC (permalink / raw)
  To: Alan Stern, Steven Rostedt, Frederic Weisbecker
  Cc: Ingo Molnar, Linux Kernel Mailing List, Andrew Morton,
	Benjamin Herrenschmidt, maneesh, Roland McGrath,
	Masami Hiramatsu

Hi Alan,
	Please find the new set of patches with the changes mentioned below.
Kindly let me know your comments with respect to the same.

Hi Frederic/Steven,
	Please find a few changes in the ftrace plugin code w.r.t.
synchronisation primitives (prefixed in the changelog with FTRACE). Kindly let
me know your comments.

Changelog
-----------

- on_each_cpu() call will now return only after all functions calls made through
  IPI have finished (@wait parameter is set to 1). This is required since
  changes made in code following on_each_cpu() [such as incrementing
  hbp_kernel_pos] have side-effects on the execution of the functions invoked
  through IPIs. It is also safe to make them wait inside spin_lock() context as
  it does a busy wait using cpu_relax().

- Introduced a new per-cpu array of pointers to 'struct hw_breakpoint'
  called 'this_hbp_kernel'. This per-cpu variable helps overcome any
  side-effects in  hbp_handler() due to parallel execution of
  (un)register_kernel_hw_breakpoint()  on other CPUs causing 'hbp_kernel' value
  to change.

- Hardware Breakpoint exceptions due to lazy debug register switching is
  now identified through the absence of TIF_DEBUG task flag in the current
  process. This eliminates the  need to store the process that last set the
  debug register in  'last_debugged_task'.

- Converted spin_lock() in kernel/hw_breakpoint.c to spin_lock_bh(). This is
  to ward-off potential circular dependancy over 'hw_breakpoint_lock' when
  flush_thread_hw_breakpoint() is invoked in softIRQ context.

- Ptrace code now directly uses the exported interfaces
  (un)register_user_hw_breakpoint() thereby addressing some of the issues
  pointed out in code-review here:http://lkml.org/lkml/2009/5/4/367.

- An optimisation in arch_update_kernel_hw_breakpoints() code resulting in a
  few modifications to the function and removal of kdr7_masks[] global array (as
  pointed out by Alan Stern).

- [FTRACE] Implementation of RCU based locking mechanism in the ftrace plugin
  code (kernel/trace/trace_ksym.c), specially to avoid the potential circular
  dependancy through ksym_collect_stats() invoked in exception context (while
  attempting to acquite a spin_lock() already used in control-plane). All
  add/delete operations to the hlist pointed by 'ksym_filter_head' are now
  protected by RCU.

- [FTRACE] Revert the spin_lock() based implementation in kernel/trace/trace_ksym.c
  to mutex based one, as a result of the above change. These locks are no longer
  required in preempt_disable() code due to RCU implementation in
  ksym_collect_stats() and hence the change.

- The patches are now based on commit 5863f0756c57cc0ce23701138bfc88ab3d1f3721
  of -tip tree.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-06-01 18:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-24  6:13 [Patch 00/12] Hardware Breakpoint Interfaces K.Prasad
2009-05-04 20:55 ` Alan Stern
2009-05-11 11:36   ` K.Prasad
2009-05-11 14:50     ` Alan Stern
2009-05-11 17:20       ` K.Prasad
2009-05-11 18:09         ` Alan Stern
2009-05-12 14:05           ` K.Prasad
2009-05-12 14:55             ` Alan Stern
2009-05-12 17:12               ` K.Prasad
2009-05-12 20:39                 ` Alan Stern
2009-05-11 11:51 [Patch 00/12] Hardware Breakpoint interfaces K.Prasad
2009-05-13 16:12 K.Prasad
2009-05-14 20:02 ` Alan Stern
2009-05-14 20:08   ` K.Prasad
2009-05-14 20:45     ` K.Prasad
2009-05-15 10:53 K.Prasad
2009-05-16  0:23 K.Prasad
2009-05-21 14:00 K.Prasad
2009-05-30 10:47 K.Prasad
2009-06-01 18:12 K.Prasad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.