[GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints

All of lore.kernel.org
 help / color / mirror / Atom feed

* [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
@ 2016-01-12  0:35 Jeff V. Merkey
  2016-01-12  0:40 ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff V. Merkey @ 2016-01-12  0:35 UTC (permalink / raw)
  To: torvalds, LKML
  Cc: Thomas Gleixner, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	X86 ML, Peter Zijlstra, Andy Lutomirski, Masami Hiramatsu,
	Steven Rostedt, Borislav Petkov, Jiri Olsa

The following changes since commit ee78027142ab39d4f3c0e1af71ed343e0ff2dafd:

  Merge pull request #12 from torvalds/master (2015-12-19 11:12:20 -0700)

are available in the git repository at:


  https://github.com/jeffmerkey/linux.git fixes

for you to fetch changes up to b5f894bf53e7c401cc5a88b8a8b13059a176a538:

  Fix INT1 Recursion with unregistered breakpoints (2015-12-19 20:33:59 -0700)

----------------------------------------------------------------
Jeff Merkey (1):
      Fix INT1 Recursion with unregistered breakpoints

 arch/x86/include/uapi/asm/debugreg.h |  1 +
 arch/x86/kernel/hw_breakpoint.c      | 25 +++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..78fc83c 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -38,6 +38,7 @@
 #define DR_RW_EXECUTE (0x0)   /* Settings for the access types to trap on */
 #define DR_RW_WRITE (0x1)
 #define DR_RW_READ (0x3)
+#define DR_RW_MASK (0x3) /* mask for breakpoint type field */
 
 #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
 #define DR_LEN_2 (0x4)
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 50a3fad..d199834 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
 static int hw_breakpoint_handler(struct die_args *args)
 {
 	int i, cpu, rc = NOTIFY_STOP;
-	struct perf_event *bp;
+	struct perf_event *bp = NULL;
 	unsigned long dr7, dr6;
 	unsigned long *dr6_p;
 
@@ -477,6 +477,14 @@ static int hw_breakpoint_handler(struct die_args *args)
 			continue;
 
 		/*
+		 * Check if we got an execute breakpoint, if so
+		 * set the resume flag to avoid int1 recursion.
+		 */
+		if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT))
+			& DR_RW_MASK) == DR_RW_EXECUTE)
+			args->regs->flags |= X86_EFLAGS_RF;
+
+		/*
 		 * The counter may be concurrently released but that can only
 		 * occur from a call_rcu() path. We can then safely fetch
 		 * the breakpoint, use its callback, touch its counter
@@ -503,7 +511,8 @@ static int hw_breakpoint_handler(struct die_args *args)
 
 		/*
 		 * Set up resume flag to avoid breakpoint recursion when
-		 * returning back to origin.
+		 * returning back to origin.  perf_bp_event may
+		 * change the flags so check twice.
 		 */
 		if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE)
 			args->regs->flags |= X86_EFLAGS_RF;
@@ -519,6 +528,18 @@ static int hw_breakpoint_handler(struct die_args *args)
 	    (dr6 & (~DR_TRAP_BITS)))
 		rc = NOTIFY_DONE;
 
+	/*
+	 * if we are about to signal to
+	 * do_debug() to stop further processing
+	 * and we have not ascertained the source
+	 * of the breakpoint, log it as spurious.
+	 */
+	if (rc == NOTIFY_STOP && !bp) {
+		printk_ratelimited(KERN_INFO
+				"INFO: spurious INT1 exception dr6: 0x%lX dr7: 0x%lX\n",
+				dr6, dr7);
+	}
+
 	set_debugreg(dr7, 7);
 	put_cpu();
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  0:35 [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints Jeff V. Merkey
@ 2016-01-12  0:40 ` Andy Lutomirski
       [not found]   ` <CAO6TR8VdW3xzqJ7jv6N3q9AJe9pEodzQqiLK4XFYUQKeP3VVtQ@mail.gmail.com>
  2016-01-12  0:50   ` Jeff Merkey
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  0:40 UTC (permalink / raw)
  To: Jeff V. Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 4:35 PM, Jeff V. Merkey <linux.mdb@gmail.com> wrote:
> The following changes since commit ee78027142ab39d4f3c0e1af71ed343e0ff2dafd:
>
>   Merge pull request #12 from torvalds/master (2015-12-19 11:12:20 -0700)
>
> are available in the git repository at:
>
>
>   https://github.com/jeffmerkey/linux.git fixes
>
> for you to fetch changes up to b5f894bf53e7c401cc5a88b8a8b13059a176a538:
>
>   Fix INT1 Recursion with unregistered breakpoints (2015-12-19 20:33:59 -0700)
>
> ----------------------------------------------------------------
> Jeff Merkey (1):
>       Fix INT1 Recursion with unregistered breakpoints
>
>  arch/x86/include/uapi/asm/debugreg.h |  1 +
>  arch/x86/kernel/hw_breakpoint.c      | 25 +++++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
> index 3c0874d..78fc83c 100644
> --- a/arch/x86/include/uapi/asm/debugreg.h
> +++ b/arch/x86/include/uapi/asm/debugreg.h
> @@ -38,6 +38,7 @@
>  #define DR_RW_EXECUTE (0x0)   /* Settings for the access types to trap on */
>  #define DR_RW_WRITE (0x1)
>  #define DR_RW_READ (0x3)
> +#define DR_RW_MASK (0x3) /* mask for breakpoint type field */
>
>  #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
>  #define DR_LEN_2 (0x4)
> diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
> index 50a3fad..d199834 100644
> --- a/arch/x86/kernel/hw_breakpoint.c
> +++ b/arch/x86/kernel/hw_breakpoint.c
> @@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
>  static int hw_breakpoint_handler(struct die_args *args)
>  {
>         int i, cpu, rc = NOTIFY_STOP;
> -       struct perf_event *bp;
> +       struct perf_event *bp = NULL;
>         unsigned long dr7, dr6;
>         unsigned long *dr6_p;
>
> @@ -477,6 +477,14 @@ static int hw_breakpoint_handler(struct die_args *args)
>                         continue;
>
>                 /*
> +                * Check if we got an execute breakpoint, if so
> +                * set the resume flag to avoid int1 recursion.
> +                */
> +               if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT))
> +                       & DR_RW_MASK) == DR_RW_EXECUTE)
> +                       args->regs->flags |= X86_EFLAGS_RF;
>

I'm still not at all sure I like this, for two reasons:

1. This duplicates the RF-setting logic with all of the legitimate
users.  I can see this blowing up in the case where a breakpoint API
user wants to clear a breakpoint and *not* set RF in case there's
another (possibly just-registered) breakpoint?

2. Shouldn't this clear the spurious breakpoint rather than leaving it set?

In any event, why not just change whatever you're doing to make this
blow up to use the actual breakpoint API?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
       [not found]   ` <CAO6TR8VdW3xzqJ7jv6N3q9AJe9pEodzQqiLK4XFYUQKeP3VVtQ@mail.gmail.com>
@ 2016-01-12  0:49     ` Andy Lutomirski
  2016-01-12  0:55       ` Jeff Merkey
  2016-01-12  1:30       ` Jeff Merkey
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  0:49 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Hi Thomas,
>
> I agree with #2, we should clear the breakpoint.  As for #1, if
> there's an execute breakpoint it MUST be cleared or it will just fire
> off again when it sees the iretd from the int1 exception handler.  I
> do use the breakpoint API Thomas, this showed up while debugging and
> testing the API with "lazy debug register switching".
>
> So do you want me to expand the patch and clear the breakpoint?  Just
> give the word and I'll get busy and GIT -R- DONE.

It seems to me that you're papering over some issue instead of fixing
the root cause.  If you're using the API, then either you're doing it
wrong or the API is broken.  Can you figure out which and fix it?

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  0:40 ` Andy Lutomirski
       [not found]   ` <CAO6TR8VdW3xzqJ7jv6N3q9AJe9pEodzQqiLK4XFYUQKeP3VVtQ@mail.gmail.com>
@ 2016-01-12  0:50   ` Jeff Merkey
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  0:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 4:35 PM, Jeff V. Merkey <linux.mdb@gmail.com>
> wrote:
>> The following changes since commit
>> ee78027142ab39d4f3c0e1af71ed343e0ff2dafd:
>>
>>   Merge pull request #12 from torvalds/master (2015-12-19 11:12:20 -0700)
>>
>> are available in the git repository at:
>>
>>
>>   https://github.com/jeffmerkey/linux.git fixes
>>
>> for you to fetch changes up to b5f894bf53e7c401cc5a88b8a8b13059a176a538:
>>
>>   Fix INT1 Recursion with unregistered breakpoints (2015-12-19 20:33:59
>> -0700)
>>
>> ----------------------------------------------------------------
>> Jeff Merkey (1):
>>       Fix INT1 Recursion with unregistered breakpoints
>>
>>  arch/x86/include/uapi/asm/debugreg.h |  1 +
>>  arch/x86/kernel/hw_breakpoint.c      | 25 +++++++++++++++++++++++--
>>  2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/debugreg.h
>> b/arch/x86/include/uapi/asm/debugreg.h
>> index 3c0874d..78fc83c 100644
>> --- a/arch/x86/include/uapi/asm/debugreg.h
>> +++ b/arch/x86/include/uapi/asm/debugreg.h
>> @@ -38,6 +38,7 @@
>>  #define DR_RW_EXECUTE (0x0)   /* Settings for the access types to trap on
>> */
>>  #define DR_RW_WRITE (0x1)
>>  #define DR_RW_READ (0x3)
>> +#define DR_RW_MASK (0x3) /* mask for breakpoint type field */
>>
>>  #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
>>  #define DR_LEN_2 (0x4)
>> diff --git a/arch/x86/kernel/hw_breakpoint.c
>> b/arch/x86/kernel/hw_breakpoint.c
>> index 50a3fad..d199834 100644
>> --- a/arch/x86/kernel/hw_breakpoint.c
>> +++ b/arch/x86/kernel/hw_breakpoint.c
>> @@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
>>  static int hw_breakpoint_handler(struct die_args *args)
>>  {
>>         int i, cpu, rc = NOTIFY_STOP;
>> -       struct perf_event *bp;
>> +       struct perf_event *bp = NULL;
>>         unsigned long dr7, dr6;
>>         unsigned long *dr6_p;
>>
>> @@ -477,6 +477,14 @@ static int hw_breakpoint_handler(struct die_args
>> *args)
>>                         continue;
>>
>>                 /*
>> +                * Check if we got an execute breakpoint, if so
>> +                * set the resume flag to avoid int1 recursion.
>> +                */
>> +               if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT))
>> +                       & DR_RW_MASK) == DR_RW_EXECUTE)
>> +                       args->regs->flags |= X86_EFLAGS_RF;
>>
>
> I'm still not at all sure I like this, for two reasons:
>
> 1. This duplicates the RF-setting logic with all of the legitimate
> users.  I can see this blowing up in the case where a breakpoint API
> user wants to clear a breakpoint and *not* set RF in case there's
> another (possibly just-registered) breakpoint?
>
> 2. Shouldn't this clear the spurious breakpoint rather than leaving it set?
>
> In any event, why not just change whatever you're doing to make this
> blow up to use the actual breakpoint API?
>

Hi Andy,

I agree with #2, we should clear the breakpoint.  As for #1, if
there's an execute breakpoint it MUST be cleared by setting the RF flags
or it will just fire off again when it sees the iretd from the int1
exception handler.  I
do use the breakpoint API, this showed up while debugging and
testing the API with "lazy debug register switching" and using kgdb.

So do you want me to expand the patch and clear the breakpoint?

:-)

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  0:49     ` Andy Lutomirski
@ 2016-01-12  0:55       ` Jeff Merkey
  2016-01-12  1:30       ` Jeff Merkey
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  0:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> Hi Thomas,
>>
>> I agree with #2, we should clear the breakpoint.  As for #1, if
>> there's an execute breakpoint it MUST be cleared or it will just fire
>> off again when it sees the iretd from the int1 exception handler.  I
>> do use the breakpoint API Thomas, this showed up while debugging and
>> testing the API with "lazy debug register switching".
>>
>> So do you want me to expand the patch and clear the breakpoint?  Just
>> give the word and I'll get busy and GIT -R- DONE.
>
> It seems to me that you're papering over some issue instead of fixing
> the root cause.  If you're using the API, then either you're doing it
> wrong or the API is broken.  Can you figure out which and fix it?
>
> --Andy
>

That's what I did, the API is broken,  you can't trigger an int1
exception and not set the resume flag -- this is what clears the
exception, otherwise the hardware will just keep firing off the int1
exception until someone sets and loads the resume flag in the
processor.

I can try to explain this again, but the way the hardware works is:

INT1 -> call exception handler (execute breakpoint)
handler clears traps flag and sets resume flag
(Exception cleared)

if you don't set the resume flag then

INT1 -> call exeception handler
RF not set
iretd instruction
INT1 -> call exception handler
INT1 -> call exception handler
INT1 -> call exception handler
... forever ...

So the hardware just keep firing off at the same execution address and
not move forward.  the resume flag tells the processor to step into
the next instruction in the pipeline.  If never set, the instruction
pointer will never advance.  This is how intel processors work.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  0:49     ` Andy Lutomirski
  2016-01-12  0:55       ` Jeff Merkey
@ 2016-01-12  1:30       ` Jeff Merkey
  2016-01-12  1:54         ` Andy Lutomirski
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  1:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> Hi Thomas,
>>
>> I agree with #2, we should clear the breakpoint.  As for #1, if
>> there's an execute breakpoint it MUST be cleared or it will just fire
>> off again when it sees the iretd from the int1 exception handler.  I
>> do use the breakpoint API Thomas, this showed up while debugging and
>> testing the API with "lazy debug register switching".
>>
>> So do you want me to expand the patch and clear the breakpoint?  Just
>> give the word and I'll get busy and GIT -R- DONE.
>
> It seems to me that you're papering over some issue instead of fixing
> the root cause.  If you're using the API, then either you're doing it
> wrong or the API is broken.  Can you figure out which and fix it?
>
> --Andy
>

Andy,

Linux should not crash because someone triggered a breakpoint or one
got triggered due to a program leaving some bits lying in a read only
register (DR6) which for some strange reason someone in the linux
world decided could be used as local storage and to pass arguments
between subsystems - a register intel designed to be read from for
status.    I did not design what's in that API, I have to live with
it.  So all I am asking is that we fix this issue.  It does not matter
to my debugger is this is fixed or not in Linux, since I carry the fix
in my patch, but it does matter to the overall robustness of Linux.

If someone triggers a breakpoint or one gets triggered due to the API,
Linux won't crash, just print a log message.  So what's the big deal
about this small and modest change to prevent that.  It makes Linux
more robust.

If you have suggestions on what items you want to see added to this
commit or possibly other commits to address these other areas, please
let me know, and I am happy to code them.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  1:30       ` Jeff Merkey
@ 2016-01-12  1:54         ` Andy Lutomirski
  2016-01-12  2:07           ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  1:54 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> Hi Thomas,
>>>
>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>> there's an execute breakpoint it MUST be cleared or it will just fire
>>> off again when it sees the iretd from the int1 exception handler.  I
>>> do use the breakpoint API Thomas, this showed up while debugging and
>>> testing the API with "lazy debug register switching".
>>>
>>> So do you want me to expand the patch and clear the breakpoint?  Just
>>> give the word and I'll get busy and GIT -R- DONE.
>>
>> It seems to me that you're papering over some issue instead of fixing
>> the root cause.  If you're using the API, then either you're doing it
>> wrong or the API is broken.  Can you figure out which and fix it?
>>
>> --Andy
>>
>
> Andy,
>
> Linux should not crash because someone triggered a breakpoint or one
> got triggered due to a program leaving some bits lying in a read only
> register (DR6) which for some strange reason someone in the linux
> world decided could be used as local storage and to pass arguments
> between subsystems - a register intel designed to be read from for
> status.    I did not design what's in that API, I have to live with
> it.

The API appears to work, though.  Are you *sure* you're using it
correctly?  Are you telling the code in kernel/hw_breakpoint.c about
your breakpoint?

> So all I am asking is that we fix this issue.  It does not matter
> to my debugger is this is fixed or not in Linux, since I carry the fix
> in my patch, but it does matter to the overall robustness of Linux.

Robust against what, exactly?  What's the bug?

I will grant that the comments about lazy dr7 switching are
mystifying, and cleaning them up might be nice.  But there's no
adequate explanation of what the failure mode is, how to trigger it,
or why your patch is a reasonable fix.  As it stands, you're
duplicating code.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  1:54         ` Andy Lutomirski
@ 2016-01-12  2:07           ` Jeff Merkey
  2016-01-12  2:19             ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  2:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>> wrote:
>>>> Hi Thomas,
>>>>
>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>> there's an execute breakpoint it MUST be cleared or it will just fire
>>>> off again when it sees the iretd from the int1 exception handler.  I
>>>> do use the breakpoint API Thomas, this showed up while debugging and
>>>> testing the API with "lazy debug register switching".
>>>>
>>>> So do you want me to expand the patch and clear the breakpoint?  Just
>>>> give the word and I'll get busy and GIT -R- DONE.
>>>
>>> It seems to me that you're papering over some issue instead of fixing
>>> the root cause.  If you're using the API, then either you're doing it
>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>
>>> --Andy
>>>
>>
>> Andy,
>>
>> Linux should not crash because someone triggered a breakpoint or one
>> got triggered due to a program leaving some bits lying in a read only
>> register (DR6) which for some strange reason someone in the linux
>> world decided could be used as local storage and to pass arguments
>> between subsystems - a register intel designed to be read from for
>> status.    I did not design what's in that API, I have to live with
>> it.
>
> The API appears to work, though.  Are you *sure* you're using it
> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
> your breakpoint?
>
>> So all I am asking is that we fix this issue.  It does not matter
>> to my debugger is this is fixed or not in Linux, since I carry the fix
>> in my patch, but it does matter to the overall robustness of Linux.
>
> Robust against what, exactly?  What's the bug?
>
> I will grant that the comments about lazy dr7 switching are
> mystifying, and cleaning them up might be nice.  But there's no
> adequate explanation of what the failure mode is, how to trigger it,
> or why your patch is a reasonable fix.  As it stands, you're
> duplicating code.
>
> --Andy

Andy,

Couple of things:

Would you like a copy of the test harness that creates this bug to
test for yourself?  I previously posted it on the list.  If you don't
have it, I'll provide it.

Since the dr6 bits get shifted around, it doesn't matter if the
breakpoint was registered or not in the API because the broken handler
will call NULL bp structures and crash whether its registered or not.

You keep asking the same questions and the answers are the the writeup
for the patch, including how it is triggered, what triggers it, how I
have triggered it, etc. and you are simply ignoring what's written
there (or you have not read it).    It makes me wonder if you really
know and understand x86 debugger stuff.

I wrote the MDB debugger on Linux 18 years ago -- that's how long I
have been working on Linux kernel development.  That's how long this
debugger has been around.  I have been working on intel based
platforms since they existed, so I did not just fall off the turnip
truck here.

:-)

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:07           ` Jeff Merkey
@ 2016-01-12  2:19             ` Andy Lutomirski
  2016-01-12  2:26               ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  2:19 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>> wrote:
>>>>> Hi Thomas,
>>>>>
>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>> there's an execute breakpoint it MUST be cleared or it will just fire
>>>>> off again when it sees the iretd from the int1 exception handler.  I
>>>>> do use the breakpoint API Thomas, this showed up while debugging and
>>>>> testing the API with "lazy debug register switching".
>>>>>
>>>>> So do you want me to expand the patch and clear the breakpoint?  Just
>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>
>>>> It seems to me that you're papering over some issue instead of fixing
>>>> the root cause.  If you're using the API, then either you're doing it
>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>
>>>> --Andy
>>>>
>>>
>>> Andy,
>>>
>>> Linux should not crash because someone triggered a breakpoint or one
>>> got triggered due to a program leaving some bits lying in a read only
>>> register (DR6) which for some strange reason someone in the linux
>>> world decided could be used as local storage and to pass arguments
>>> between subsystems - a register intel designed to be read from for
>>> status.    I did not design what's in that API, I have to live with
>>> it.
>>
>> The API appears to work, though.  Are you *sure* you're using it
>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>> your breakpoint?
>>
>>> So all I am asking is that we fix this issue.  It does not matter
>>> to my debugger is this is fixed or not in Linux, since I carry the fix
>>> in my patch, but it does matter to the overall robustness of Linux.
>>
>> Robust against what, exactly?  What's the bug?
>>
>> I will grant that the comments about lazy dr7 switching are
>> mystifying, and cleaning them up might be nice.  But there's no
>> adequate explanation of what the failure mode is, how to trigger it,
>> or why your patch is a reasonable fix.  As it stands, you're
>> duplicating code.
>>
>> --Andy
>
> Andy,
>
> Couple of things:
>
> Would you like a copy of the test harness that creates this bug to
> test for yourself?  I previously posted it on the list.  If you don't
> have it, I'll provide it.

If you can send a short, buildable thing that triggers it, I'll read it.

>
> Since the dr6 bits get shifted around, it doesn't matter if the
> breakpoint was registered or not in the API because the broken handler
> will call NULL bp structures and crash whether its registered or not.
>

And what exactly does this have to do with anything?  Your patch is
all about spurious breakpoints triggered by dr7 and should have
nothing much to do with the value in dr6.  Unless dr6 is missing a bit
due to some issue, but you never suggested any problem like that.

> You keep asking the same questions and the answers are the the writeup
> for the patch, including how it is triggered, what triggers it, how I
> have triggered it, etc. and you are simply ignoring what's written
> there (or you have not read it).    It makes me wonder if you really
> know and understand x86 debugger stuff.
>

No, your writeup is long, hard to read, and doesn't address how any
actual problem exists.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:19             ` Andy Lutomirski
@ 2016-01-12  2:26               ` Jeff Merkey
  2016-01-12  2:40                 ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  2:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com>
>>> wrote:
>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>> wrote:
>>>>>> Hi Thomas,
>>>>>>
>>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>>> there's an execute breakpoint it MUST be cleared or it will just fire
>>>>>> off again when it sees the iretd from the int1 exception handler.  I
>>>>>> do use the breakpoint API Thomas, this showed up while debugging and
>>>>>> testing the API with "lazy debug register switching".
>>>>>>
>>>>>> So do you want me to expand the patch and clear the breakpoint?  Just
>>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>>
>>>>> It seems to me that you're papering over some issue instead of fixing
>>>>> the root cause.  If you're using the API, then either you're doing it
>>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>>
>>>>> --Andy
>>>>>
>>>>
>>>> Andy,
>>>>
>>>> Linux should not crash because someone triggered a breakpoint or one
>>>> got triggered due to a program leaving some bits lying in a read only
>>>> register (DR6) which for some strange reason someone in the linux
>>>> world decided could be used as local storage and to pass arguments
>>>> between subsystems - a register intel designed to be read from for
>>>> status.    I did not design what's in that API, I have to live with
>>>> it.
>>>
>>> The API appears to work, though.  Are you *sure* you're using it
>>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>>> your breakpoint?
>>>
>>>> So all I am asking is that we fix this issue.  It does not matter
>>>> to my debugger is this is fixed or not in Linux, since I carry the fix
>>>> in my patch, but it does matter to the overall robustness of Linux.
>>>
>>> Robust against what, exactly?  What's the bug?
>>>
>>> I will grant that the comments about lazy dr7 switching are
>>> mystifying, and cleaning them up might be nice.  But there's no
>>> adequate explanation of what the failure mode is, how to trigger it,
>>> or why your patch is a reasonable fix.  As it stands, you're
>>> duplicating code.
>>>
>>> --Andy
>>
>> Andy,
>>
>> Couple of things:
>>
>> Would you like a copy of the test harness that creates this bug to
>> test for yourself?  I previously posted it on the list.  If you don't
>> have it, I'll provide it.
>
> If you can send a short, buildable thing that triggers it, I'll read it.
>
>>
>> Since the dr6 bits get shifted around, it doesn't matter if the
>> breakpoint was registered or not in the API because the broken handler
>> will call NULL bp structures and crash whether its registered or not.
>>
>
> And what exactly does this have to do with anything?  Your patch is
> all about spurious breakpoints triggered by dr7 and should have
> nothing much to do with the value in dr6.  Unless dr6 is missing a bit
> due to some issue, but you never suggested any problem like that.
>

It's about setting the resume flag when an execute breakpoint occurs,  no matter
what caused the breakpoint.  If is not set, the system will hang with
that processor
hung on the same execution address.  You cannot have an int1 exception path
that does not set the resume flag which is the case here -- there
should be no path
where this flag does not get set on an execute breakpoint.


>> You keep asking the same questions and the answers are the the writeup
>> for the patch, including how it is triggered, what triggers it, how I
>> have triggered it, etc. and you are simply ignoring what's written
>> there (or you have not read it).    It makes me wonder if you really
>> know and understand x86 debugger stuff.
>>
>
> No, your writeup is long, hard to read, and doesn't address how any
> actual problem exists.
>
> --Andy
>

I'm sorry Andy, but you are just flat wrong.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:26               ` Jeff Merkey
@ 2016-01-12  2:40                 ` Andy Lutomirski
  2016-01-12  2:50                   ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  2:40 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 6:26 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>> wrote:
>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>> wrote:
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>>>> there's an execute breakpoint it MUST be cleared or it will just fire
>>>>>>> off again when it sees the iretd from the int1 exception handler.  I
>>>>>>> do use the breakpoint API Thomas, this showed up while debugging and
>>>>>>> testing the API with "lazy debug register switching".
>>>>>>>
>>>>>>> So do you want me to expand the patch and clear the breakpoint?  Just
>>>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>>>
>>>>>> It seems to me that you're papering over some issue instead of fixing
>>>>>> the root cause.  If you're using the API, then either you're doing it
>>>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>>>
>>>>>> --Andy
>>>>>>
>>>>>
>>>>> Andy,
>>>>>
>>>>> Linux should not crash because someone triggered a breakpoint or one
>>>>> got triggered due to a program leaving some bits lying in a read only
>>>>> register (DR6) which for some strange reason someone in the linux
>>>>> world decided could be used as local storage and to pass arguments
>>>>> between subsystems - a register intel designed to be read from for
>>>>> status.    I did not design what's in that API, I have to live with
>>>>> it.
>>>>
>>>> The API appears to work, though.  Are you *sure* you're using it
>>>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>>>> your breakpoint?
>>>>
>>>>> So all I am asking is that we fix this issue.  It does not matter
>>>>> to my debugger is this is fixed or not in Linux, since I carry the fix
>>>>> in my patch, but it does matter to the overall robustness of Linux.
>>>>
>>>> Robust against what, exactly?  What's the bug?
>>>>
>>>> I will grant that the comments about lazy dr7 switching are
>>>> mystifying, and cleaning them up might be nice.  But there's no
>>>> adequate explanation of what the failure mode is, how to trigger it,
>>>> or why your patch is a reasonable fix.  As it stands, you're
>>>> duplicating code.
>>>>
>>>> --Andy
>>>
>>> Andy,
>>>
>>> Couple of things:
>>>
>>> Would you like a copy of the test harness that creates this bug to
>>> test for yourself?  I previously posted it on the list.  If you don't
>>> have it, I'll provide it.
>>
>> If you can send a short, buildable thing that triggers it, I'll read it.
>>
>>>
>>> Since the dr6 bits get shifted around, it doesn't matter if the
>>> breakpoint was registered or not in the API because the broken handler
>>> will call NULL bp structures and crash whether its registered or not.
>>>
>>
>> And what exactly does this have to do with anything?  Your patch is
>> all about spurious breakpoints triggered by dr7 and should have
>> nothing much to do with the value in dr6.  Unless dr6 is missing a bit
>> due to some issue, but you never suggested any problem like that.
>>
>
> It's about setting the resume flag when an execute breakpoint occurs,  no matter
> what caused the breakpoint.  If is not set, the system will hang with
> that processor
> hung on the same execution address.  You cannot have an int1 exception path
> that does not set the resume flag which is the case here -- there
> should be no path
> where this flag does not get set on an execute breakpoint.

There are many, many ways that one can corrupt kernel state to break
things.  You could screw up IST state basically anywhere and crash.
You could screw up GSBASE.  You could poke bad values into pt_regs in
a fast syscall and hit the infamous SYSRET failure.  You can write a
buggy .fault handler that returns success and doesn't actually do
anything.  And yes, you can set a bit in dr7 without telling the
hw_breakpoint code about it and thus infinite loop.

Meanwhile, you keep claiming that kernel has a bug and that the bug
can't be triggered without out-of-tree code.  In my book, that's not a
bug.

If you want to submit a nice clean patch to hw_breakpoint_handler to
change the behavior on an unmatched breakpoint, then submit such a
patch and justify why (a) the new behavior is better and (b) why it
doesn't break any actual in-tree code.

Yes, hw_breakpoint_notify is a piece of shit.  In particular, the
hilarously indirect way in which it's invoked makes no sense
whatsoever.  Fixing that (as a separate patch) would be fantastic IMO.
But putting extra workarounds into do_debug is not okay, IMO.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:40                 ` Andy Lutomirski
@ 2016-01-12  2:50                   ` Jeff Merkey
  2016-01-12  2:55                     ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  2:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 6:26 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com>
>>> wrote:
>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>> wrote:
>>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi Thomas,
>>>>>>>>
>>>>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>>>>> there's an execute breakpoint it MUST be cleared or it will just
>>>>>>>> fire
>>>>>>>> off again when it sees the iretd from the int1 exception handler.
>>>>>>>> I
>>>>>>>> do use the breakpoint API Thomas, this showed up while debugging
>>>>>>>> and
>>>>>>>> testing the API with "lazy debug register switching".
>>>>>>>>
>>>>>>>> So do you want me to expand the patch and clear the breakpoint?
>>>>>>>> Just
>>>>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>>>>
>>>>>>> It seems to me that you're papering over some issue instead of
>>>>>>> fixing
>>>>>>> the root cause.  If you're using the API, then either you're doing
>>>>>>> it
>>>>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>>>>
>>>>>>> --Andy
>>>>>>>
>>>>>>
>>>>>> Andy,
>>>>>>
>>>>>> Linux should not crash because someone triggered a breakpoint or one
>>>>>> got triggered due to a program leaving some bits lying in a read only
>>>>>> register (DR6) which for some strange reason someone in the linux
>>>>>> world decided could be used as local storage and to pass arguments
>>>>>> between subsystems - a register intel designed to be read from for
>>>>>> status.    I did not design what's in that API, I have to live with
>>>>>> it.
>>>>>
>>>>> The API appears to work, though.  Are you *sure* you're using it
>>>>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>>>>> your breakpoint?
>>>>>
>>>>>> So all I am asking is that we fix this issue.  It does not matter
>>>>>> to my debugger is this is fixed or not in Linux, since I carry the
>>>>>> fix
>>>>>> in my patch, but it does matter to the overall robustness of Linux.
>>>>>
>>>>> Robust against what, exactly?  What's the bug?
>>>>>
>>>>> I will grant that the comments about lazy dr7 switching are
>>>>> mystifying, and cleaning them up might be nice.  But there's no
>>>>> adequate explanation of what the failure mode is, how to trigger it,
>>>>> or why your patch is a reasonable fix.  As it stands, you're
>>>>> duplicating code.
>>>>>
>>>>> --Andy
>>>>
>>>> Andy,
>>>>
>>>> Couple of things:
>>>>
>>>> Would you like a copy of the test harness that creates this bug to
>>>> test for yourself?  I previously posted it on the list.  If you don't
>>>> have it, I'll provide it.
>>>
>>> If you can send a short, buildable thing that triggers it, I'll read it.
>>>
>>>>
>>>> Since the dr6 bits get shifted around, it doesn't matter if the
>>>> breakpoint was registered or not in the API because the broken handler
>>>> will call NULL bp structures and crash whether its registered or not.
>>>>
>>>
>>> And what exactly does this have to do with anything?  Your patch is
>>> all about spurious breakpoints triggered by dr7 and should have
>>> nothing much to do with the value in dr6.  Unless dr6 is missing a bit
>>> due to some issue, but you never suggested any problem like that.
>>>
>>
>> It's about setting the resume flag when an execute breakpoint occurs,  no
>> matter
>> what caused the breakpoint.  If is not set, the system will hang with
>> that processor
>> hung on the same execution address.  You cannot have an int1 exception
>> path
>> that does not set the resume flag which is the case here -- there
>> should be no path
>> where this flag does not get set on an execute breakpoint.
>
> There are many, many ways that one can corrupt kernel state to break
> things.  You could screw up IST state basically anywhere and crash.
> You could screw up GSBASE.  You could poke bad values into pt_regs in
> a fast syscall and hit the infamous SYSRET failure.  You can write a
> buggy .fault handler that returns success and doesn't actually do
> anything.  And yes, you can set a bit in dr7 without telling the
> hw_breakpoint code about it and thus infinite loop.
>
> Meanwhile, you keep claiming that kernel has a bug and that the bug
> can't be triggered without out-of-tree code.  In my book, that's not a
> bug.
>

The handler that fails to set the resume flag is in tree code.

> If you want to submit a nice clean patch to hw_breakpoint_handler to
> change the behavior on an unmatched breakpoint, then submit such a
> patch and justify why (a) the new behavior is better and (b) why it
> doesn't break any actual in-tree code.
>

At last, a compromise -- accepted.  In the meantime, put this patch in
to get rid of the crash.  I'll code up another series and you can help me by
reviewing it and keeping me on my toes.

> Yes, hw_breakpoint_notify is a piece of shit.  In particular, the
> hilarously indirect way in which it's invoked makes no sense
> whatsoever.  Fixing that (as a separate patch) would be fantastic IMO.
> But putting extra workarounds into do_debug is not okay, IMO.

Totally agree......


I'll get to work.  In the meantime, put this one in for me boss.  I
take Sunday afternoon off, BTW.

:-)

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:50                   ` Jeff Merkey
@ 2016-01-12  2:55                     ` Andy Lutomirski
  2016-01-12  3:09                       ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2016-01-12  2:55 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On Mon, Jan 11, 2016 at 6:50 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, Jan 11, 2016 at 6:26 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>> wrote:
>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>> wrote:
>>>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Hi Thomas,
>>>>>>>>>
>>>>>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>>>>>> there's an execute breakpoint it MUST be cleared or it will just
>>>>>>>>> fire
>>>>>>>>> off again when it sees the iretd from the int1 exception handler.
>>>>>>>>> I
>>>>>>>>> do use the breakpoint API Thomas, this showed up while debugging
>>>>>>>>> and
>>>>>>>>> testing the API with "lazy debug register switching".
>>>>>>>>>
>>>>>>>>> So do you want me to expand the patch and clear the breakpoint?
>>>>>>>>> Just
>>>>>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>>>>>
>>>>>>>> It seems to me that you're papering over some issue instead of
>>>>>>>> fixing
>>>>>>>> the root cause.  If you're using the API, then either you're doing
>>>>>>>> it
>>>>>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>>>>>
>>>>>>>> --Andy
>>>>>>>>
>>>>>>>
>>>>>>> Andy,
>>>>>>>
>>>>>>> Linux should not crash because someone triggered a breakpoint or one
>>>>>>> got triggered due to a program leaving some bits lying in a read only
>>>>>>> register (DR6) which for some strange reason someone in the linux
>>>>>>> world decided could be used as local storage and to pass arguments
>>>>>>> between subsystems - a register intel designed to be read from for
>>>>>>> status.    I did not design what's in that API, I have to live with
>>>>>>> it.
>>>>>>
>>>>>> The API appears to work, though.  Are you *sure* you're using it
>>>>>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>>>>>> your breakpoint?
>>>>>>
>>>>>>> So all I am asking is that we fix this issue.  It does not matter
>>>>>>> to my debugger is this is fixed or not in Linux, since I carry the
>>>>>>> fix
>>>>>>> in my patch, but it does matter to the overall robustness of Linux.
>>>>>>
>>>>>> Robust against what, exactly?  What's the bug?
>>>>>>
>>>>>> I will grant that the comments about lazy dr7 switching are
>>>>>> mystifying, and cleaning them up might be nice.  But there's no
>>>>>> adequate explanation of what the failure mode is, how to trigger it,
>>>>>> or why your patch is a reasonable fix.  As it stands, you're
>>>>>> duplicating code.
>>>>>>
>>>>>> --Andy
>>>>>
>>>>> Andy,
>>>>>
>>>>> Couple of things:
>>>>>
>>>>> Would you like a copy of the test harness that creates this bug to
>>>>> test for yourself?  I previously posted it on the list.  If you don't
>>>>> have it, I'll provide it.
>>>>
>>>> If you can send a short, buildable thing that triggers it, I'll read it.
>>>>
>>>>>
>>>>> Since the dr6 bits get shifted around, it doesn't matter if the
>>>>> breakpoint was registered or not in the API because the broken handler
>>>>> will call NULL bp structures and crash whether its registered or not.
>>>>>
>>>>
>>>> And what exactly does this have to do with anything?  Your patch is
>>>> all about spurious breakpoints triggered by dr7 and should have
>>>> nothing much to do with the value in dr6.  Unless dr6 is missing a bit
>>>> due to some issue, but you never suggested any problem like that.
>>>>
>>>
>>> It's about setting the resume flag when an execute breakpoint occurs,  no
>>> matter
>>> what caused the breakpoint.  If is not set, the system will hang with
>>> that processor
>>> hung on the same execution address.  You cannot have an int1 exception
>>> path
>>> that does not set the resume flag which is the case here -- there
>>> should be no path
>>> where this flag does not get set on an execute breakpoint.
>>
>> There are many, many ways that one can corrupt kernel state to break
>> things.  You could screw up IST state basically anywhere and crash.
>> You could screw up GSBASE.  You could poke bad values into pt_regs in
>> a fast syscall and hit the infamous SYSRET failure.  You can write a
>> buggy .fault handler that returns success and doesn't actually do
>> anything.  And yes, you can set a bit in dr7 without telling the
>> hw_breakpoint code about it and thus infinite loop.
>>
>> Meanwhile, you keep claiming that kernel has a bug and that the bug
>> can't be triggered without out-of-tree code.  In my book, that's not a
>> bug.
>>
>
> The handler that fails to set the resume flag is in tree code.

It's an unreachable code path.

>
>> If you want to submit a nice clean patch to hw_breakpoint_handler to
>> change the behavior on an unmatched breakpoint, then submit such a
>> patch and justify why (a) the new behavior is better and (b) why it
>> doesn't break any actual in-tree code.
>>
>
> At last, a compromise -- accepted.  In the meantime, put this patch in
> to get rid of the crash.  I'll code up another series and you can help me by
> reviewing it and keeping me on my toes.
>

No, because it still doesn't fix a bug *and* it's not a cleanup.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  2:55                     ` Andy Lutomirski
@ 2016-01-12  3:09                       ` Jeff Merkey
  2016-01-12  3:24                         ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  3:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 11, 2016 at 6:50 PM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Mon, Jan 11, 2016 at 6:26 PM, Jeff Merkey <linux.mdb@gmail.com>
>>> wrote:
>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Mon, Jan 11, 2016 at 6:07 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>> wrote:
>>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>>> On Mon, Jan 11, 2016 at 5:30 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>>> wrote:
>>>>>>>> On 1/11/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>>>>> On Mon, Jan 11, 2016 at 4:44 PM, Jeff Merkey <linux.mdb@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> Hi Thomas,
>>>>>>>>>>
>>>>>>>>>> I agree with #2, we should clear the breakpoint.  As for #1, if
>>>>>>>>>> there's an execute breakpoint it MUST be cleared or it will just
>>>>>>>>>> fire
>>>>>>>>>> off again when it sees the iretd from the int1 exception handler.
>>>>>>>>>> I
>>>>>>>>>> do use the breakpoint API Thomas, this showed up while debugging
>>>>>>>>>> and
>>>>>>>>>> testing the API with "lazy debug register switching".
>>>>>>>>>>
>>>>>>>>>> So do you want me to expand the patch and clear the breakpoint?
>>>>>>>>>> Just
>>>>>>>>>> give the word and I'll get busy and GIT -R- DONE.
>>>>>>>>>
>>>>>>>>> It seems to me that you're papering over some issue instead of
>>>>>>>>> fixing
>>>>>>>>> the root cause.  If you're using the API, then either you're doing
>>>>>>>>> it
>>>>>>>>> wrong or the API is broken.  Can you figure out which and fix it?
>>>>>>>>>
>>>>>>>>> --Andy
>>>>>>>>>
>>>>>>>>
>>>>>>>> Andy,
>>>>>>>>
>>>>>>>> Linux should not crash because someone triggered a breakpoint or
>>>>>>>> one
>>>>>>>> got triggered due to a program leaving some bits lying in a read
>>>>>>>> only
>>>>>>>> register (DR6) which for some strange reason someone in the linux
>>>>>>>> world decided could be used as local storage and to pass arguments
>>>>>>>> between subsystems - a register intel designed to be read from for
>>>>>>>> status.    I did not design what's in that API, I have to live with
>>>>>>>> it.
>>>>>>>
>>>>>>> The API appears to work, though.  Are you *sure* you're using it
>>>>>>> correctly?  Are you telling the code in kernel/hw_breakpoint.c about
>>>>>>> your breakpoint?
>>>>>>>
>>>>>>>> So all I am asking is that we fix this issue.  It does not matter
>>>>>>>> to my debugger is this is fixed or not in Linux, since I carry the
>>>>>>>> fix
>>>>>>>> in my patch, but it does matter to the overall robustness of Linux.
>>>>>>>
>>>>>>> Robust against what, exactly?  What's the bug?
>>>>>>>
>>>>>>> I will grant that the comments about lazy dr7 switching are
>>>>>>> mystifying, and cleaning them up might be nice.  But there's no
>>>>>>> adequate explanation of what the failure mode is, how to trigger it,
>>>>>>> or why your patch is a reasonable fix.  As it stands, you're
>>>>>>> duplicating code.
>>>>>>>
>>>>>>> --Andy
>>>>>>
>>>>>> Andy,
>>>>>>
>>>>>> Couple of things:
>>>>>>
>>>>>> Would you like a copy of the test harness that creates this bug to
>>>>>> test for yourself?  I previously posted it on the list.  If you don't
>>>>>> have it, I'll provide it.
>>>>>
>>>>> If you can send a short, buildable thing that triggers it, I'll read
>>>>> it.
>>>>>
>>>>>>
>>>>>> Since the dr6 bits get shifted around, it doesn't matter if the
>>>>>> breakpoint was registered or not in the API because the broken
>>>>>> handler
>>>>>> will call NULL bp structures and crash whether its registered or not.
>>>>>>
>>>>>
>>>>> And what exactly does this have to do with anything?  Your patch is
>>>>> all about spurious breakpoints triggered by dr7 and should have
>>>>> nothing much to do with the value in dr6.  Unless dr6 is missing a bit
>>>>> due to some issue, but you never suggested any problem like that.
>>>>>
>>>>
>>>> It's about setting the resume flag when an execute breakpoint occurs,
>>>> no
>>>> matter
>>>> what caused the breakpoint.  If is not set, the system will hang with
>>>> that processor
>>>> hung on the same execution address.  You cannot have an int1 exception
>>>> path
>>>> that does not set the resume flag which is the case here -- there
>>>> should be no path
>>>> where this flag does not get set on an execute breakpoint.
>>>
>>> There are many, many ways that one can corrupt kernel state to break
>>> things.  You could screw up IST state basically anywhere and crash.
>>> You could screw up GSBASE.  You could poke bad values into pt_regs in
>>> a fast syscall and hit the infamous SYSRET failure.  You can write a
>>> buggy .fault handler that returns success and doesn't actually do
>>> anything.  And yes, you can set a bit in dr7 without telling the
>>> hw_breakpoint code about it and thus infinite loop.
>>>
>>> Meanwhile, you keep claiming that kernel has a bug and that the bug
>>> can't be triggered without out-of-tree code.  In my book, that's not a
>>> bug.
>>>
>>
>> The handler that fails to set the resume flag is in tree code.
>
> It's an unreachable code path.
>
>>
>>> If you want to submit a nice clean patch to hw_breakpoint_handler to
>>> change the behavior on an unmatched breakpoint, then submit such a
>>> patch and justify why (a) the new behavior is better and (b) why it
>>> doesn't break any actual in-tree code.
>>>
>>
>> At last, a compromise -- accepted.  In the meantime, put this patch in
>> to get rid of the crash.  I'll code up another series and you can help me
>> by
>> reviewing it and keeping me on my toes.
>>
>
> No, because it still doesn't fix a bug *and* it's not a cleanup.
>

I'll work on the patch series you asked me to do.  This has been in
the kernel for a long time and only affects kernel debuggers for the
most part, and I have coded around it.  It's not unreachable though --
trigger a breakpoint without registering it -- or register another
int1 exception handler (its the two handler case when it shows up).
So if someone other than hw_breakpoint.c registers a second handler,
it shows up there too -- that's how I found it in the first place.
It can be triggered by simply calling in tree code and setting a
breakpoint with a second int1 handler registered as well, which I did
not report in the patch documentation because I wanted to keep it
short.    So consumers of the in-tree code can trigger it as well --
just like you said, in the notify handler, which is where the real
problem lies.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  3:09                       ` Jeff Merkey
@ 2016-01-12  3:24                         ` Jeff Merkey
  2016-01-13 16:28                           ` Jeff Merkey
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Merkey @ 2016-01-12  3:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

Anyway Andy,

If Linus wants to pull it, be my guest -- I think its a bug.    If you
guys want to wait that's ok too.  In the meantime, I totally agree
that someone needs to fix the busted hw_breakpoint notify handling and
I will focus on that.  I appreciate the opportunity to help out and
thanks for suggesting some areas I can.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints
  2016-01-12  3:24                         ` Jeff Merkey
@ 2016-01-13 16:28                           ` Jeff Merkey
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Merkey @ 2016-01-13 16:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Peter Zijlstra, Andy Lutomirski,
	Masami Hiramatsu, Steven Rostedt, Borislav Petkov, Jiri Olsa

On 1/11/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Anyway Andy,
>
> If Linus wants to pull it, be my guest -- I think its a bug.    If you
> guys want to wait that's ok too.  In the meantime, I totally agree
> that someone needs to fix the busted hw_breakpoint notify handling and
> I will focus on that.  I appreciate the opportunity to help out and
> thanks for suggesting some areas I can.
>
> Jeff
>

Linus,

OK, I have emailed Andy off line to ask about these other areas in
hw_breakpoint.c with the busted handlers he mentioned and gotten
no response (I guess he does not respond unless there's an audience).

I have other areas I am working on right now, including a new set of
changes from
the kbuild test robot for the debugger I have to get checked in and
regressed and other projects I am working on with Linux.  I submitted
this because its a bug, I think you should pull it until Thomas and
his team can come up with a better fix.    It's a squeaky clean patch
that has all of Thomas perks and suggestions.

As Thomas stated, the whole subsystem has some issues anyway because
of the way dr6 is handled.  I can fix this mess, but its a lot of work
and I am not going to attempt to do so if I am just going to submit a
patch series to folks and have them reject it for whatever reason.

In 18 years Linus I have never put a patch into Linux, this would be
my first.  I was really looking forward to finally getting one in.

:-)

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-01-13 16:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12  0:35 [GIT PULL v4.5] Fix INT1 recursion with unregistered breakpoints Jeff V. Merkey
2016-01-12  0:40 ` Andy Lutomirski
     [not found]   ` <CAO6TR8VdW3xzqJ7jv6N3q9AJe9pEodzQqiLK4XFYUQKeP3VVtQ@mail.gmail.com>
2016-01-12  0:49     ` Andy Lutomirski
2016-01-12  0:55       ` Jeff Merkey
2016-01-12  1:30       ` Jeff Merkey
2016-01-12  1:54         ` Andy Lutomirski
2016-01-12  2:07           ` Jeff Merkey
2016-01-12  2:19             ` Andy Lutomirski
2016-01-12  2:26               ` Jeff Merkey
2016-01-12  2:40                 ` Andy Lutomirski
2016-01-12  2:50                   ` Jeff Merkey
2016-01-12  2:55                     ` Andy Lutomirski
2016-01-12  3:09                       ` Jeff Merkey
2016-01-12  3:24                         ` Jeff Merkey
2016-01-13 16:28                           ` Jeff Merkey
2016-01-12  0:50   ` Jeff Merkey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.